Synapse — My Local-First Personal AI
Synapse is a personal AI assistant I run on my home PC. It’s been quietly compounding for a while now, and it deserves its own post because it’s grown into something I rely on every day.
The pitch is straightforward: I wanted an AI that doesn’t depend on a cloud subscription, doesn’t send my data to anyone, owns its own memory, and learns about me over time. Local-first, second-brain shaped. Not a chatbot — a memory service that I can talk to, that remembers things across conversations, and that ties into the rest of my digital life.
It runs on Ollama for the language model layer (currently Gemma 4 for chat and Gemma 2 2B for background fact extraction, more on that in a second), ChromaDB for vector search, SQLite for structured facts and conversation logs, and a FastAPI backend serving everything on port 8321. There’s a Tauri desktop app for chat on my computer, a React Native mobile app for chat on my phone, and an MCP server that exposes everything to Claude Code so I can ask Claude to consult my Synapse memory in the middle of a development session. Google Drive, Gmail, and Calendar all flow into the memory store automatically. Location data from my phone gets geofenced and turns into context.
The whole thing is designed around one core idea: the LLM is a router and a formatter, not a generator. Synapse uses it to figure out what you mean, decide which tool to call (memory search, Gmail lookup, calendar query, file read), and pretty up the result. Heavy lifting like generation gets exported as a markdown brief that I take to Claude or another bigger model. This split keeps Synapse fast on modest hardware and lets each tool do what it’s best at.
The most recent change is something I’m pretty proud of. Until last night Synapse used a single LLM model for everything. Chat queries and background memory extraction ran on the same model, which meant the heavy chat model was being used for the trivial “parse this conversation into facts” job, AND the extraction was happening synchronously, blocking the chat response by a few seconds every time. Both of those were dumb in retrospect.
So I split it. There are now two model roles: chat (Gemma 4, 4B effective parameters, 128K context) and extractor (Gemma 2 2B, fast and small). Configured independently via env vars. And the extraction step is now fire-and-forget — the chat response returns to the user immediately, and the fact extraction runs as a background asyncio task that quietly writes to memory when it finishes. The user never waits on it. There’s even a graceful shutdown that gives in-flight extraction tasks a five-second window to finish before the FastAPI lifespan tears down. 374 tests passing across the suite.
Progress so far
- ✅ Backend with 50+ FastAPI endpoints, all functional
- ✅ Memory system: vector + structured + BM25 + RRF fusion
- ✅ Google integrations: Gmail, Calendar, Drive, Docs, Sheets
- ✅ Location engine with geofencing and reminders
- ✅ Workflow engine with trust-level promotion
- ✅ MCP server exposing 20+ tools to Claude Code
- ✅ React Native mobile app with location tracking and push notifications
- ✅ Tauri desktop app with dark navy + cyan chat UI
- ✅ Multi-role model routing — chat on Gemma 4, extractor on Gemma 2 2B
- ✅ Async background extraction so chat responses no longer wait on memory writes
- ✅ Bounded graceful shutdown for in-flight background tasks
What’s next
A few directions are open. I’d like to upgrade the memory extractor itself from a single LLM call into a multi-step agent that has its own tools — able to look up existing facts before storing new ones (deduplication), reason about which category a fact belongs in, and ask the user for clarification when something is ambiguous. That’s a meaningful intelligence boost but it adds complexity, so it’s queued as a separate project rather than something to bolt onto the current refactor.
Other things in the queue: PDF and DOCX ingestion for the memory extractor (drop a document into a watched folder and Synapse files it away), a permissioned file sandboxing pattern for any future tools that want to read or write to disk, and a small Tauri-based desktop tool I’m calling Agent Watcher that gives me a live view of background AI agents running on my machine. That last one is its own future post.
The Synapse blueprint is held in a private skill file that future sessions auto-load, so the project state survives across conversations. That’s been the single most useful pattern in keeping a long-running personal project actually moving forward instead of restarting from scratch every few weeks.