Setting Up Ollama for Local AI Development
Guide to installing and configuring Ollama on Mac Studio for multi-agent systems
Setting Up Ollama for Local AI Development
Ollama provides a simple way to run large language models locally. Here’s how I set it up on my Mac Studio M1 Max with 64GB RAM.
Installation
brew install ollama
Running Models
Pull and run qwen2.5-coder:14b for coding tasks:
ollama pull qwen2.5-coder:14b
ollama run qwen2.5-coder:14b
Configuration for Multi-Agent Systems
When running multiple agents simultaneously, memory management is crucial. I found that the 14b model works better than 32b when running 3-4 agents in parallel.
Integration with CrewAI
Example configuration for using Ollama with CrewAI:
from crewai import LLM
llm = LLM(
model="ollama/qwen2.5-coder:14b",
base_url="http://localhost:11434"
)
Performance Notes
- Model loading time: ~5 seconds
- First token latency: ~500ms
- Tokens per second: ~40-50 on M1 Max
- Memory usage: ~10-12GB per model instance
Next Steps
- Integrate with Neo4j for knowledge graphs
- Add mem0 for persistent memory layer
- Test concurrent agent execution