BrainLayer
Persistent memory and knowledge graph for AI agents — 9 MCP tools, real-time indexing hooks, and a native macOS daemon for always-on recall across every conversation.
224,000+ chunks indexed · 1,002 Python + 28 Swift tests · Real-time indexing hooks · 9 MCP tools · BrainBar daemon (209KB) · Zero cloud dependencies
Your AI agent forgets everything between sessions. Every architecture decision, every debugging session, every preference you've expressed — gone. You repeat yourself constantly.
BrainLayer fixes this. It's a local-first memory layer that gives any MCP-compatible AI agent the ability to remember, think, and recall across conversations. Includes BrainBar — a 209KB native macOS daemon that provides always-on memory access.
"What approach did I use for auth last month?" → brain_search
"Show me everything about this file's history" → brain_recall
"What was I working on yesterday?" → brain_recall
"Remember this decision for later" → brain_store
"Ingest this meeting transcript" → brain_digest
"What do we know about this person?" → brain_get_person
"Look up the Domica project entity" → brain_entity
Quick Start
pip install brainlayer
brainlayer init # Interactive setup wizard
brainlayer index # Index your Claude Code conversations
Then add to your editor's MCP config:
Claude Code (~/.claude.json):
{
"mcpServers": {
"brainlayer": {
"command": "brainlayer-mcp"
}
}
}
Other editors (Cursor, Zed, VS Code)
Cursor (MCP settings):
{
"mcpServers": {
"brainlayer": {
"command": "brainlayer-mcp"
}
}
}
Zed (settings.json):
{
"context_servers": {
"brainlayer": {
"command": { "path": "brainlayer-mcp" }
}
}
}
VS Code (.vscode/mcp.json):
{
"servers": {
"brainlayer": {
"command": "brainlayer-mcp"
}
}
}
That's it. Your agent now has persistent memory across every conversation.
Architecture
graph LR
A["Claude Code / Cursor / Zed"] -->|MCP| B["BrainLayer MCP Server<br/>9 tools"]
B --> C["Hybrid Search<br/>semantic + keyword (RRF)"]
C --> D["SQLite + sqlite-vec<br/>single .db file"]
B --> KG["Knowledge Graph<br/>entities + relations"]
KG --> D
E["Claude Code JSONL<br/>conversations"] --> F["Pipeline"]
F -->|extract → classify → chunk → embed| D
G["Local LLM<br/>Ollama / MLX"] -->|enrich| D
H["Real-time Hooks"] -->|live per-message| D
I["BrainBar<br/>macOS daemon"] -->|Unix socket MCP| B
Everything runs locally. No cloud accounts, no API keys, no Docker, no database servers.
| Component | Implementation |
|---|---|
| Storage | SQLite + sqlite-vec (single .db file, WAL mode) |
| Embeddings | bge-large-en-v1.5 via sentence-transformers (1024 dims, runs on CPU/MPS) |
| Search | Hybrid: vector similarity + FTS5 keyword, merged with Reciprocal Rank Fusion |
| Enrichment | Local LLM via Ollama or MLX — 10-field metadata per chunk |
| MCP Server | stdio-based, MCP SDK v1.26+, compatible with any MCP client |
| Clustering | Leiden + UMAP for brain graph visualization (optional) |
| BrainBar | Native macOS daemon (209KB Swift binary) — always-on MCP over Unix socket |
MCP Tools (9)
Core (4)
| Tool | Description |
|---|---|
brain_search |
Semantic search — unified search across query, file_path, chunk_id, filters. |
brain_store |
Persist memories — ideas, decisions, learnings, mistakes. Auto-type/auto-importance. |
brain_recall |
Proactive retrieval — current context, sessions, session summaries. |
brain_tags |
Browse and filter by tag — discover what's in memory without a search query. |
Knowledge Graph (5)
| Tool | Description |
|---|---|
brain_digest |
Ingest raw content — entity extraction, relations, sentiment, action items. |
brain_entity |
Look up entities in the knowledge graph — type, relations, evidence. |
brain_expand |
Expand a chunk_id with N surrounding chunks for full context. |
brain_update |
Update, archive, or merge existing memories. |
brain_get_person |
Person lookup — entity details, interactions, preferences (~200-500ms). |
Backward Compatibility
All 14 old brainlayer_* names still work as aliases.
Enrichment
BrainLayer enriches each chunk with 10 structured metadata fields using a local LLM:
| Field | Example |
|---|---|
summary |
"Debugging Telegram bot message drops under load" |
tags |
"telegram, debugging, performance" |
importance |
8 (architectural decision) vs 2 (directory listing) |
intent |
debugging, designing, implementing, configuring, deciding, reviewing |
primary_symbols |
"TelegramBot, handleMessage, grammy" |
resolved_query |
"How does the Telegram bot handle rate limiting?" |
epistemic_level |
hypothesis, substantiated, validated |
version_scope |
"grammy 1.32, Node 22" |
debt_impact |
introduction, resolution, none |
external_deps |
"grammy, Supabase, Railway" |
Three enrichment backends (auto-detect: MLX → Ollama → Groq, override via BRAINLAYER_ENRICH_BACKEND):
| Backend | Best for | Speed |
|---|---|---|
| Groq (cloud) | When local LLMs are unavailable | ~1-2s/chunk |
| MLX (Apple Silicon) | M1/M2/M3 Macs (preferred) | 21-87% faster than Ollama |
| Ollama | Any platform | ~1s/chunk (short), ~13s (long) |
brainlayer enrich # Default backend (auto-detects)
BRAINLAYER_ENRICH_BACKEND=groq brainlayer enrich --batch-size=100
Why BrainLayer?
| BrainLayer | Mem0 | Zep/Graphiti | Letta | LangChain Memory | |
|---|---|---|---|---|---|
| MCP native | 9 tools | 1 server | 1 server | No | No |
| Think / Recall | Yes | No | No | No | No |
| Local-first | SQLite | Cloud-first | Cloud-only | Docker+PG | Framework |
| Zero infra | pip install |
API key | API key | Docker | Multiple deps |
| Multi-source | 7 sources | API only | API only | API only | API only |
| Enrichment | 10 fields | Basic | Temporal | Self-write | None |
| Session analysis | Yes | No | No | No | No |
| Real-time | Per-message hooks | No | No | No | No |
| Open source | Apache 2.0 | Apache 2.0 | Source-available | Apache 2.0 | MIT |
BrainLayer is the only memory layer that:
- Thinks before answering — categorizes past knowledge by intent (decisions, bugs, patterns) instead of raw search results
- Runs on a single file — no database servers, no Docker, no cloud accounts
- Works with every MCP client — 9 tools, instant integration, zero SDK
- Knowledge graph — entities, relations, and person lookup across all indexed data
CLI Reference
brainlayer init # Interactive setup wizard
brainlayer index # Index new conversations
brainlayer search "query" # Semantic + keyword search
brainlayer enrich # Run LLM enrichment on new chunks
brainlayer enrich-sessions # Session-level analysis (decisions, learnings)
brainlayer stats # Database statistics
brainlayer brain-export # Generate brain graph JSON
brainlayer export-obsidian # Export to Obsidian vault
brainlayer dashboard # Interactive TUI dashboard
Configuration
All configuration is via environment variables:
| Variable | Default | Description |
|---|---|---|
BRAINLAYER_DB |
~/.local/share/brainlayer/brainlayer.db |
Database file path |
BRAINLAYER_ENRICH_BACKEND |
auto-detect (MLX → Ollama → Groq) | Enrichment LLM backend (mlx, ollama, or groq) |
BRAINLAYER_ENRICH_MODEL |
glm-4.7-flash |
Ollama model name |
BRAINLAYER_MLX_MODEL |
mlx-community/Qwen2.5-Coder-14B-Instruct-4bit |
MLX model identifier |
BRAINLAYER_OLLAMA_URL |
http://127.0.0.1:11434/api/generate |
Ollama API endpoint |
BRAINLAYER_MLX_URL |
http://127.0.0.1:8080/v1/chat/completions |
MLX server endpoint |
BRAINLAYER_STALL_TIMEOUT |
300 |
Seconds before killing a stuck enrichment chunk |
BRAINLAYER_HEARTBEAT_INTERVAL |
25 |
Log progress every N chunks during enrichment |
BRAINLAYER_SANITIZE_EXTRA_NAMES |
(empty) | Comma-separated names to redact from indexed content |
BRAINLAYER_SANITIZE_USE_SPACY |
true |
Use spaCy NER for PII detection |
GROQ_API_KEY |
(unset) | Groq API key for cloud enrichment backend |
BRAINLAYER_GROQ_URL |
https://api.groq.com/openai/v1/chat/completions |
Groq API endpoint |
BRAINLAYER_GROQ_MODEL |
llama-3.3-70b-versatile |
Groq model for enrichment |
Optional Extras
pip install "brainlayer[brain]" # Brain graph visualization (Leiden + UMAP) + FAISS
pip install "brainlayer[cloud]" # Cloud backfill (Gemini Batch API)
pip install "brainlayer[youtube]" # YouTube transcript indexing
pip install "brainlayer[ast]" # AST-aware code chunking (tree-sitter)
pip install "brainlayer[kg]" # GliNER entity extraction (209M params, EN+HE)
pip install "brainlayer[style]" # ChromaDB vector store (alternative backend)
pip install "brainlayer[dev]" # Development: pytest, ruff
Data Sources
BrainLayer can index conversations from multiple sources:
| Source | Format | Indexer |
|---|---|---|
| Claude Code | JSONL (~/.claude/projects/) |
brainlayer index |
| Claude Desktop | JSON export | brainlayer index --source desktop |
Exported .txt chat |
brainlayer index --source whatsapp |
|
| YouTube | Transcripts via yt-dlp | brainlayer index --source youtube |
| Codex CLI | JSONL (~/.codex/sessions) |
brainlayer ingest-codex |
| Markdown | Any .md files |
brainlayer index --source markdown |
| Manual | Via MCP tool | brain_store |
| Real-time | Claude Code hooks | Live per-message indexing (zero-lag) |
Testing
pip install -e ".[dev]"
pytest tests/ # Full suite (1,002 Python tests)
pytest tests/ -m "not integration" # Unit tests only (fast)
ruff check src/ # Linting
# BrainBar (Swift): 28 tests via Xcode
Roadmap
See docs/roadmap.md for planned features including boot context loading, compact search, pinned memories, and MCP Registry listing.
Contributing
Contributions welcome! See CONTRIBUTING.md for dev setup, testing, and PR guidelines.
License
Apache 2.0 — see LICENSE.
Origin
BrainLayer was originally developed as "Zikaron" (Hebrew: memory) inside a personal AI agent ecosystem. It was extracted into a standalone project because every developer deserves persistent AI memory — not just the ones building their own agent systems.