Production Multi-Agent Memory Pipeline on Ubuntu
How I built a production-grade multi-agent memory system with SQLite, vector search, and automated indexing on a headless Ubuntu server
Production Multi-Agent Memory Pipeline on Ubuntu
Overview
I run a production multi-agent memory system on a headless Ubuntu 22.04 server (refurbished MacBook Pro). Three AI agents — with distinct roles — write to isolated source-tagged event streams, which merge nightly into a canonical knowledge base backed by SQLite + vector embeddings, queryable via a FastAPI backend and searchable through a static Astro frontend.
Architecture
┌─────────────┐ ┌──────────────┐ ┌────────────────┐
│ Agents │────▶│ JSON Streams │────▶│ merge_streams │
│ (3 roles) │ │ (1 per agent)│ │ (22:45 cron) │
└─────────────┘ └──────────────┘ └────────────────┘
│
┌────────────────────────┘
▼
┌─────────────┐ ┌──────────┐
│golden_knowledge│──▶│ SQLite │
│ (JSON) │ │ + vector │
└─────────────┘ └──────────┘
│
┌───────────────────┘
▼
┌──────────────┐ ┌──────────┐
│ mnemos_api │────▶│ Ollama │
│ (FastAPI) │ │ (embed) │
└──────────────┘ └──────────┘
│
▼
┌──────────────┐
│ Astro static │
│ (search) │
└──────────────┘
Components
1. Agent Streams (Event-Driven Logging)
Each agent writes only to its own stream:
- steward →
streams/steward/log.json(financial analysis, debt correspondence, system ops) - bro →
streams/bro/log.json(integration, knowledge curation) - human →
streams/human/log.json(daily impressions via CLIdaycommand)
Entries carry source, type, timestamp, and sha256 fingerprint. No cross-stream writes.
2. Merger (22:45 UTC daily)
merge_streams.py reads all streams, deduplicates via SHA256, appends to golden_knowledge.json with _merged_at timestamp. Atomic write-then-rename to prevent corruption.
3. SQLite + Vector Index
- Table:
knowledge_basewith 768-dimcontent_embedding(Ollamanomic-embed-text) - Sources:
knowledge_sourcestracks file provenance - Hybrid search: Cosine similarity on embeddings + FTS5 full-text fallback
- Current state: 185+ indexed chunks across memos, docs, debt correspondence, and uploaded documents
4. FastAPI Backend
POST /api/v1/knowledge/search— semantic + FTS hybridPOST /api/v1/knowledge/upload— file ingest, chunk, embed, storeGET /api/v1/knowledge/stats— index health- Serves Astro
dist/viaStaticFilesmount on same port (8081)
5. Frontend
Astro static site with:
/search— live semantic search with source filters/upload— drag-and-drop document upload- Shared origin with API (no CORS, no token friction)
Key Decisions
SQLite over Postgres: Single-file, zero-config, survives on a 256GB SSD. Vector extension via Ollama embeddings, not pgvector. Backed up every 6 hours.
Ollama over cloud APIs: Embeddings run locally on nomic-embed-text. No API costs, no latency, works offline. Server has 16GB RAM — enough for embedding model + qwen2.5-coder.
Static frontend over SPA: Astro builds to static HTML. FastAPI serves it. No separate process, no reverse proxy complexity, one port to firewall.
No privacy compartments: Debt correspondence, personal memos, and technical docs share the same vector space. The index is reality — not a sanitized version of it.
Operations
Cron schedule:
- 07:00 —
mnemos-knowledge.pyre-indexes memos, docs, streams, debt docs - 22:45 —
merge_streams.pymerges agent logs - Every 6h —
agent-collective-backup.shto external SSD
Hardware: Refurbished MacBook Pro running Ubuntu 22.04 Server, iPhone USB tethering for internet, headless operation via SSH from primary M1 Max MacBook.
What I learned
- Chunking matters. Naive sentence splitting breaks context. I use regex paragraph chunks with overlap for semantic continuity.
- Embeddings need recency weighting. Older knowledge sources should rank lower unless explicitly queried.
- Single origin reduces complexity. API + static site on same port = no CORS, no token management, one ufw rule.
- SQLite scales further than people think. 185 chunks is small, but the architecture handles 100,000+ without changes.
Stack
- OS: Ubuntu 22.04 Server (headless)
- Backend: FastAPI + uvicorn
- Database: SQLite (stdlib + custom vector similarity)
- Embeddings: Ollama (nomic-embed-text)
- Frontend: Astro (static, deployed to Cloudflare Pages)
- Language: Python 3.11, TypeScript
- Infra: systemd services, cron, ufw, SSH
Built over 3 months of focused, full-time work. All systems operational since May 2026.