← the memory
systems ·

Production Multi-Agent Memory Pipeline on Ubuntu

How I built a production-grade multi-agent memory system with SQLite, vector search, and automated indexing on a headless Ubuntu server

#ai-infrastructure#multi-agent#sqlite#ollama#fastapi#ubuntu

Production Multi-Agent Memory Pipeline on Ubuntu

Overview

I run a production multi-agent memory system on a headless Ubuntu 22.04 server (refurbished MacBook Pro). Three AI agents — with distinct roles — write to isolated source-tagged event streams, which merge nightly into a canonical knowledge base backed by SQLite + vector embeddings, queryable via a FastAPI backend and searchable through a static Astro frontend.

Architecture

┌─────────────┐     ┌──────────────┐     ┌────────────────┐
│   Agents    │────▶│ JSON Streams │────▶│  merge_streams │
│  (3 roles)  │     │ (1 per agent)│     │  (22:45 cron)  │
└─────────────┘     └──────────────┘     └────────────────┘

                          ┌────────────────────────┘

                   ┌─────────────┐     ┌──────────┐
                   │golden_knowledge│──▶│ SQLite   │
                   │   (JSON)     │     │ + vector │
                   └─────────────┘     └──────────┘

                          ┌───────────────────┘

                   ┌──────────────┐     ┌──────────┐
                   │  mnemos_api  │────▶│  Ollama  │
                   │   (FastAPI)  │     │ (embed)  │
                   └──────────────┘     └──────────┘


                   ┌──────────────┐
                   │ Astro static │
                   │   (search)   │
                   └──────────────┘

Components

1. Agent Streams (Event-Driven Logging)

Each agent writes only to its own stream:

  • stewardstreams/steward/log.json (financial analysis, debt correspondence, system ops)
  • brostreams/bro/log.json (integration, knowledge curation)
  • humanstreams/human/log.json (daily impressions via CLI day command)

Entries carry source, type, timestamp, and sha256 fingerprint. No cross-stream writes.

2. Merger (22:45 UTC daily)

merge_streams.py reads all streams, deduplicates via SHA256, appends to golden_knowledge.json with _merged_at timestamp. Atomic write-then-rename to prevent corruption.

3. SQLite + Vector Index

  • Table: knowledge_base with 768-dim content_embedding (Ollama nomic-embed-text)
  • Sources: knowledge_sources tracks file provenance
  • Hybrid search: Cosine similarity on embeddings + FTS5 full-text fallback
  • Current state: 185+ indexed chunks across memos, docs, debt correspondence, and uploaded documents

4. FastAPI Backend

  • POST /api/v1/knowledge/search — semantic + FTS hybrid
  • POST /api/v1/knowledge/upload — file ingest, chunk, embed, store
  • GET /api/v1/knowledge/stats — index health
  • Serves Astro dist/ via StaticFiles mount on same port (8081)

5. Frontend

Astro static site with:

  • /search — live semantic search with source filters
  • /upload — drag-and-drop document upload
  • Shared origin with API (no CORS, no token friction)

Key Decisions

SQLite over Postgres: Single-file, zero-config, survives on a 256GB SSD. Vector extension via Ollama embeddings, not pgvector. Backed up every 6 hours.

Ollama over cloud APIs: Embeddings run locally on nomic-embed-text. No API costs, no latency, works offline. Server has 16GB RAM — enough for embedding model + qwen2.5-coder.

Static frontend over SPA: Astro builds to static HTML. FastAPI serves it. No separate process, no reverse proxy complexity, one port to firewall.

No privacy compartments: Debt correspondence, personal memos, and technical docs share the same vector space. The index is reality — not a sanitized version of it.

Operations

Cron schedule:

  • 07:00 — mnemos-knowledge.py re-indexes memos, docs, streams, debt docs
  • 22:45 — merge_streams.py merges agent logs
  • Every 6h — agent-collective-backup.sh to external SSD

Hardware: Refurbished MacBook Pro running Ubuntu 22.04 Server, iPhone USB tethering for internet, headless operation via SSH from primary M1 Max MacBook.

What I learned

  • Chunking matters. Naive sentence splitting breaks context. I use regex paragraph chunks with overlap for semantic continuity.
  • Embeddings need recency weighting. Older knowledge sources should rank lower unless explicitly queried.
  • Single origin reduces complexity. API + static site on same port = no CORS, no token management, one ufw rule.
  • SQLite scales further than people think. 185 chunks is small, but the architecture handles 100,000+ without changes.

Stack

  • OS: Ubuntu 22.04 Server (headless)
  • Backend: FastAPI + uvicorn
  • Database: SQLite (stdlib + custom vector similarity)
  • Embeddings: Ollama (nomic-embed-text)
  • Frontend: Astro (static, deployed to Cloudflare Pages)
  • Language: Python 3.11, TypeScript
  • Infra: systemd services, cron, ufw, SSH

Built over 3 months of focused, full-time work. All systems operational since May 2026.