Production Multi-Agent Memory Pipeline on Ubuntu

Overview

I run a production multi-agent memory system on a headless Ubuntu 22.04 server (refurbished MacBook Pro). Three AI agents — with distinct roles — write to isolated source-tagged event streams, which merge nightly into a canonical knowledge base backed by SQLite + vector embeddings, queryable via a FastAPI backend and searchable through a static Astro frontend.

Architecture

┌─────────────┐     ┌──────────────┐     ┌────────────────┐
│   Agents    │────▶│ JSON Streams │────▶│  merge_streams │
│  (3 roles)  │     │ (1 per agent)│     │  (22:45 cron)  │
└─────────────┘     └──────────────┘     └────────────────┘
                                                   │
                          ┌────────────────────────┘
                          ▼
                   ┌─────────────┐     ┌──────────┐
                   │golden_knowledge│──▶│ SQLite   │
                   │   (JSON)     │     │ + vector │
                   └─────────────┘     └──────────┘
                                              │
                          ┌───────────────────┘
                          ▼
                   ┌──────────────┐     ┌──────────┐
                   │  mnemos_api  │────▶│  Ollama  │
                   │   (FastAPI)  │     │ (embed)  │
                   └──────────────┘     └──────────┘
                          │
                          ▼
                   ┌──────────────┐
                   │ Astro static │
                   │   (search)   │
                   └──────────────┘

Components

1. Agent Streams (Event-Driven Logging)

Each agent writes only to its own stream:

steward → streams/steward/log.json (financial analysis, debt correspondence, system ops)
bro → streams/bro/log.json (integration, knowledge curation)
human → streams/human/log.json (daily impressions via CLI day command)

Entries carry source, type, timestamp, and sha256 fingerprint. No cross-stream writes.

2. Merger (22:45 UTC daily)

merge_streams.py reads all streams, deduplicates via SHA256, appends to golden_knowledge.json with _merged_at timestamp. Atomic write-then-rename to prevent corruption.

3. SQLite + Vector Index

Table: knowledge_base with 768-dim content_embedding (Ollama nomic-embed-text)
Sources: knowledge_sources tracks file provenance
Hybrid search: Cosine similarity on embeddings + FTS5 full-text fallback
Current state: 185+ indexed chunks across memos, docs, debt correspondence, and uploaded documents

4. FastAPI Backend

POST /api/v1/knowledge/search — semantic + FTS hybrid
POST /api/v1/knowledge/upload — file ingest, chunk, embed, store
GET /api/v1/knowledge/stats — index health
Serves Astro dist/ via StaticFiles mount on same port (8081)

5. Frontend

Astro static site with:

/search — live semantic search with source filters
/upload — drag-and-drop document upload
Shared origin with API (no CORS, no token friction)

Key Decisions

SQLite over Postgres: Single-file, zero-config, survives on a 256GB SSD. Vector extension via Ollama embeddings, not pgvector. Backed up every 6 hours.

Ollama over cloud APIs: Embeddings run locally on nomic-embed-text. No API costs, no latency, works offline. Server has 16GB RAM — enough for embedding model + qwen2.5-coder.

Static frontend over SPA: Astro builds to static HTML. FastAPI serves it. No separate process, no reverse proxy complexity, one port to firewall.

No privacy compartments: Debt correspondence, personal memos, and technical docs share the same vector space. The index is reality — not a sanitized version of it.

Operations

Cron schedule:

07:00 — mnemos-knowledge.py re-indexes memos, docs, streams, debt docs
22:45 — merge_streams.py merges agent logs
Every 6h — agent-collective-backup.sh to external SSD

Hardware: Refurbished MacBook Pro running Ubuntu 22.04 Server, iPhone USB tethering for internet, headless operation via SSH from primary M1 Max MacBook.

What I learned

Chunking matters. Naive sentence splitting breaks context. I use regex paragraph chunks with overlap for semantic continuity.
Embeddings need recency weighting. Older knowledge sources should rank lower unless explicitly queried.
Single origin reduces complexity. API + static site on same port = no CORS, no token management, one ufw rule.
SQLite scales further than people think. 185 chunks is small, but the architecture handles 100,000+ without changes.

Stack

OS: Ubuntu 22.04 Server (headless)
Backend: FastAPI + uvicorn
Database: SQLite (stdlib + custom vector similarity)
Embeddings: Ollama (nomic-embed-text)
Frontend: Astro (static, deployed to Cloudflare Pages)
Language: Python 3.11, TypeScript
Infra: systemd services, cron, ufw, SSH

Built over 3 months of focused, full-time work. All systems operational since May 2026.