Setting Up Ollama for Local AI Development

Ollama provides a simple way to run large language models locally. Here’s how I set it up on my Mac Studio M1 Max with 64GB RAM.

Installation

brew install ollama

Running Models

Pull and run qwen2.5-coder:14b for coding tasks:

ollama pull qwen2.5-coder:14b
ollama run qwen2.5-coder:14b

Configuration for Multi-Agent Systems

When running multiple agents simultaneously, memory management is crucial. I found that the 14b model works better than 32b when running 3-4 agents in parallel.

Integration with CrewAI

Example configuration for using Ollama with CrewAI:

from crewai import LLM

llm = LLM(
    model="ollama/qwen2.5-coder:14b",
    base_url="http://localhost:11434"
)

Performance Notes

Model loading time: ~5 seconds
First token latency: ~500ms
Tokens per second: ~40-50 on M1 Max
Memory usage: ~10-12GB per model instance

Next Steps

Integrate with Neo4j for knowledge graphs
Add mem0 for persistent memory layer
Test concurrent agent execution