mbucknam

retavyn

Community mbucknam
Updated

Personal MCP memory server — persistent memory for Claude across sessions, backed by PostgreSQL with hybrid FTS+vector search

retavyn

Persistent memory for Claude across sessions.

Every Claude session starts cold — no memory of what you worked on yesterday, what decisions you made, what you learned. Retavyn fixes that. It stores what matters and injects it back into Claude's context automatically at the start of every session.

You talk to Claude normally. It remembers.

Features

  • Automatic context injection — a SessionStart hook dumps all memories to a local cache and injects them into context before the first message
  • Hybrid search — full-text (tsvector/tsquery) and semantic similarity (pgvector) combined for recall that works on exact words or general concepts
  • Two transport modes — stdio for Claude Code, HTTP/SSE for claude.ai remote access
  • Category tagging — store memories with categories (ci-cd, journal, project, etc.) for filtered recall
  • Bulk ingestioningest_path walks a file or directory tree and stores each file as a memory, with automatic embedding backfill
  • Live cache refresh — a PostToolUse hook refreshes the local cache immediately after every remember call
  • OAuth-secured remote access — custom OAuth 2.0 + JWT flow required by the MCP spec for HTTP transport, served behind a Cloudflare Tunnel

How it works

Retavyn runs as an MCP server alongside Claude. When a session starts, a hook fires automatically — it dumps all stored memories to a local cache file and injects them into Claude's context before the first message. A second hook refreshes that cache after every remember call, so new memories are available in the next session immediately.

Search is hybrid: full-text (tsvector/tsquery) for exact matches and semantic similarity (pgvector cosine distance) for concept-level recall. Results from both passes are merged and ranked.

The server supports two transports. In stdio mode, Claude Code spawns it as a local subprocess — zero network exposure. In HTTP/SSE mode, it runs on a server behind a Cloudflare Tunnel with OAuth 2.0 + JWT auth, and claude.ai connects to it as a remote MCP server. That same HTTP endpoint is also what lets multiple machines share one memory pool — every Claude Code install can point its MCP config at the remote database, so your memories follow you across machines.

Architecture

Claude Code (local, stdio)

┌──────────────────────────────────────────────────────────┐
│                      Claude Code                          │
│   SessionStart hook ──► inject retavyn-cache.md          │
│   PostToolUse hook  ──► refresh cache after remember      │
└───────────────────────────┬──────────────────────────────┘
                            │ stdio  (MCP protocol)
                   ┌────────▼────────┐
                   │    retavyn      │  Python + FastMCP
                   │   MCP server   │
                   └────────┬────────┘
                            │
                   ┌────────▼────────┐
                   │  PostgreSQL 18  │  Docker · port 5433
                   │  + pgvector     │  tsvector + pgvector
                   └─────────────────┘

claude.ai (remote, HTTP/SSE)

claude.ai  →  https://mcp.retavyn.com  →  Cloudflare edge (TLS)
           →  cloudflared tunnel  →  retavyn :8765  →  PostgreSQL :5433

OAuth flow: claude.ai opens /authorize, user authenticates, server issues a JWT, claude.ai uses it as a Bearer token on all subsequent MCP calls.

Search internals

When you call recall("billing pipeline"), retavyn runs two passes and merges the results:

  1. Full-text searchtsvector @@ to_tsquery('billing & pipeline'), ranked by ts_rank
  2. Semantic search — cosine distance between the query embedding and stored embeddings via pgvector (embedding <=> $1 < threshold)
  3. Results are deduplicated and returned ranked by combined score

Embeddings are generated via OpenAI text-embedding-3-small or Cohere embed-english-v3.0 (configurable). Memories without embeddings fall back to full-text only.

MCP tools

Tool Description
remember Store a memory with optional category tag
recall Hybrid full-text + semantic search across memories
update_memory Edit an existing memory by ID
forget Delete a memory by ID
forget_path Delete all memories ingested from a file or directory path
ingest_path Bulk-import a file or directory tree as memories
backfill_embeddings Generate embeddings for memories that don't have them
ask_infra Ask a DevOps question — runs a full agent loop (memory search + live gcloud) and returns a synthesized answer

ask_infra

ask_infra is an agent embedded inside retavyn. When called, it spins up its own Claude tool-use loop with two tools — recall_memory (hybrid search over your retavyn memories) and run_gcloud (read-only live GCP queries) — iterates until it has a complete answer, then returns it as a single response.

From Claude Code's perspective it's one tool call. Under the hood it's a full agent making multiple passes across memory and live infrastructure state before synthesizing an answer.

Example questions:

"What load balancer setup do we use for Cloud Run services?"

"Which GKE clusters are running in prod right now?"

"How do we handle Cloud SQL private service connect?"

The agent is also available as a standalone CLI — see infra-agent/README.md.

Setup

Guide What it covers
INSTALL.md Local setup — run retavyn on your machine with Claude Code
SERVER.md Remote server — deploy to a VM for claude.ai and cross-machine access

Environment variables

Variable Default Description
MEMORY_DB_HOST localhost PostgreSQL host
MEMORY_DB_PORT 5433 PostgreSQL port
MEMORY_DB_NAME retavyn Database name
MEMORY_DB_USER claude Database user
MEMORY_DB_PASSWORD claude Database password
MEMORY_TRANSPORT stdio stdio or streamable-http
MEMORY_HOST 0.0.0.0 Bind address (HTTP mode)
MEMORY_PORT 8765 Port (HTTP mode)
OAUTH_SECRET JWT signing secret (HTTP mode)
OAUTH_PASSWORD Auth password for browser flow (HTTP mode)
OPENAI_API_KEY For OpenAI embeddings (optional)
COHERE_API_KEY For Cohere embeddings (optional)

Documentation

File Contents
INSTALL.md Local install: setup.sh, MCP config, hooks
SERVER.md Remote deploy: GCE VM, Cloudflare Tunnel, OAuth, claude.ai
TUTORIAL.md First memory → first recall → journaling
API.md Complete tool reference, search internals, advanced usage

CLI commands

python main.py          # start MCP server (stdio)
python main.py dump     # export all memories to ~/.claude/retavyn-cache.md
python main.py remember <content> [category]  # store a memory from the CLI
python main.py health   # check DB connection and memory count
python main.py ingest <path> [category]  # bulk ingest a file or directory

© 2026 Matt Bucknam — MIT License

MCP Server · Populars

MCP Server · New

    russellbrenner

    jurisd

    MCP server for Australian and New Zealand legal research. Searches AustLII for case law and legislation, retrieves full-text judgements with paragraph numbers preserved, and supports OCR for scanned PDFs.

    Community russellbrenner
    NSHipster

    sosumi.ai

    Making Apple docs AI-readable

    Community NSHipster
    socfortress

    Wazuh MCP Server

    Repo to hold wazuh manager mcp server

    Community socfortress
    lancelin111

    抖音视频上传 Skills

    🎥 Douyin (TikTok China) MCP Server - Automated video upload service via Model Context Protocol for AI integration

    Community lancelin111
    Codeturion

    codesurface

    Give your AI agent instant API lookups instead of expensive source file reads. MCP server for C#, Go, Java, Python, and TypeScript.

    Community Codeturion