retavyn
Persistent memory for Claude across sessions.
Every Claude session starts cold — no memory of what you worked on yesterday, what decisions you made, what you learned. Retavyn fixes that. It stores what matters and injects it back into Claude's context automatically at the start of every session.
You talk to Claude normally. It remembers.
Features
- Automatic context injection — a
SessionStarthook dumps all memories to a local cache and injects them into context before the first message - Hybrid search — full-text (
tsvector/tsquery) and semantic similarity (pgvector) combined for recall that works on exact words or general concepts - Two transport modes — stdio for Claude Code, HTTP/SSE for claude.ai remote access
- Category tagging — store memories with categories (
ci-cd,journal,project, etc.) for filtered recall - Bulk ingestion —
ingest_pathwalks a file or directory tree and stores each file as a memory, with automatic embedding backfill - Live cache refresh — a
PostToolUsehook refreshes the local cache immediately after everyremembercall - OAuth-secured remote access — custom OAuth 2.0 + JWT flow required by the MCP spec for HTTP transport, served behind a Cloudflare Tunnel
How it works
Retavyn runs as an MCP server alongside Claude. When a session starts, a hook fires automatically — it dumps all stored memories to a local cache file and injects them into Claude's context before the first message. A second hook refreshes that cache after every remember call, so new memories are available in the next session immediately.
Search is hybrid: full-text (tsvector/tsquery) for exact matches and semantic similarity (pgvector cosine distance) for concept-level recall. Results from both passes are merged and ranked.
The server supports two transports. In stdio mode, Claude Code spawns it as a local subprocess — zero network exposure. In HTTP/SSE mode, it runs on a server behind a Cloudflare Tunnel with OAuth 2.0 + JWT auth, and claude.ai connects to it as a remote MCP server. That same HTTP endpoint is also what lets multiple machines share one memory pool — every Claude Code install can point its MCP config at the remote database, so your memories follow you across machines.
Architecture
Claude Code (local, stdio)
┌──────────────────────────────────────────────────────────┐
│ Claude Code │
│ SessionStart hook ──► inject retavyn-cache.md │
│ PostToolUse hook ──► refresh cache after remember │
└───────────────────────────┬──────────────────────────────┘
│ stdio (MCP protocol)
┌────────▼────────┐
│ retavyn │ Python + FastMCP
│ MCP server │
└────────┬────────┘
│
┌────────▼────────┐
│ PostgreSQL 18 │ Docker · port 5433
│ + pgvector │ tsvector + pgvector
└─────────────────┘
claude.ai (remote, HTTP/SSE)
claude.ai → https://mcp.retavyn.com → Cloudflare edge (TLS)
→ cloudflared tunnel → retavyn :8765 → PostgreSQL :5433
OAuth flow: claude.ai opens /authorize, user authenticates, server issues a JWT, claude.ai uses it as a Bearer token on all subsequent MCP calls.
Search internals
When you call recall("billing pipeline"), retavyn runs two passes and merges the results:
- Full-text search —
tsvector @@ to_tsquery('billing & pipeline'), ranked byts_rank - Semantic search — cosine distance between the query embedding and stored embeddings via pgvector (
embedding <=> $1 < threshold) - Results are deduplicated and returned ranked by combined score
Embeddings are generated via OpenAI text-embedding-3-small or Cohere embed-english-v3.0 (configurable). Memories without embeddings fall back to full-text only.
MCP tools
| Tool | Description |
|---|---|
remember |
Store a memory with optional category tag |
recall |
Hybrid full-text + semantic search across memories |
update_memory |
Edit an existing memory by ID |
forget |
Delete a memory by ID |
forget_path |
Delete all memories ingested from a file or directory path |
ingest_path |
Bulk-import a file or directory tree as memories |
backfill_embeddings |
Generate embeddings for memories that don't have them |
ask_infra |
Ask a DevOps question — runs a full agent loop (memory search + live gcloud) and returns a synthesized answer |
ask_infra
ask_infra is an agent embedded inside retavyn. When called, it spins up its own Claude tool-use loop with two tools — recall_memory (hybrid search over your retavyn memories) and run_gcloud (read-only live GCP queries) — iterates until it has a complete answer, then returns it as a single response.
From Claude Code's perspective it's one tool call. Under the hood it's a full agent making multiple passes across memory and live infrastructure state before synthesizing an answer.
Example questions:
"What load balancer setup do we use for Cloud Run services?"
"Which GKE clusters are running in prod right now?"
"How do we handle Cloud SQL private service connect?"
The agent is also available as a standalone CLI — see infra-agent/README.md.
Setup
| Guide | What it covers |
|---|---|
| INSTALL.md | Local setup — run retavyn on your machine with Claude Code |
| SERVER.md | Remote server — deploy to a VM for claude.ai and cross-machine access |
Environment variables
| Variable | Default | Description |
|---|---|---|
MEMORY_DB_HOST |
localhost |
PostgreSQL host |
MEMORY_DB_PORT |
5433 |
PostgreSQL port |
MEMORY_DB_NAME |
retavyn |
Database name |
MEMORY_DB_USER |
claude |
Database user |
MEMORY_DB_PASSWORD |
claude |
Database password |
MEMORY_TRANSPORT |
stdio |
stdio or streamable-http |
MEMORY_HOST |
0.0.0.0 |
Bind address (HTTP mode) |
MEMORY_PORT |
8765 |
Port (HTTP mode) |
OAUTH_SECRET |
— | JWT signing secret (HTTP mode) |
OAUTH_PASSWORD |
— | Auth password for browser flow (HTTP mode) |
OPENAI_API_KEY |
— | For OpenAI embeddings (optional) |
COHERE_API_KEY |
— | For Cohere embeddings (optional) |
Documentation
| File | Contents |
|---|---|
| INSTALL.md | Local install: setup.sh, MCP config, hooks |
| SERVER.md | Remote deploy: GCE VM, Cloudflare Tunnel, OAuth, claude.ai |
| TUTORIAL.md | First memory → first recall → journaling |
| API.md | Complete tool reference, search internals, advanced usage |
CLI commands
python main.py # start MCP server (stdio)
python main.py dump # export all memories to ~/.claude/retavyn-cache.md
python main.py remember <content> [category] # store a memory from the CLI
python main.py health # check DB connection and memory count
python main.py ingest <path> [category] # bulk ingest a file or directory
© 2026 Matt Bucknam — MIT License