← the memory
ai-ml ·

Setting Up Ollama for Local AI Development

Guide to installing and configuring Ollama on Mac Studio for multi-agent systems

#ollama#local-ai#llm#mac

Setting Up Ollama for Local AI Development

Ollama provides a simple way to run large language models locally. Here’s how I set it up on my Mac Studio M1 Max with 64GB RAM.

Installation

brew install ollama

Running Models

Pull and run qwen2.5-coder:14b for coding tasks:

ollama pull qwen2.5-coder:14b
ollama run qwen2.5-coder:14b

Configuration for Multi-Agent Systems

When running multiple agents simultaneously, memory management is crucial. I found that the 14b model works better than 32b when running 3-4 agents in parallel.

Integration with CrewAI

Example configuration for using Ollama with CrewAI:

from crewai import LLM

llm = LLM(
    model="ollama/qwen2.5-coder:14b",
    base_url="http://localhost:11434"
)

Performance Notes

  • Model loading time: ~5 seconds
  • First token latency: ~500ms
  • Tokens per second: ~40-50 on M1 Max
  • Memory usage: ~10-12GB per model instance

Next Steps

  • Integrate with Neo4j for knowledge graphs
  • Add mem0 for persistent memory layer
  • Test concurrent agent execution