engram
Persistent memory for AI agents — organized by time and space. Important memories get promoted, noise decays naturally, and related knowledge clusters into a browsable topic tree. Fully automatic.
中文
Quick Start
# Install and start
curl -fsSL https://raw.githubusercontent.com/kael-bit/engram-rs/main/install.sh | bash
# Store a memory
curl -X POST http://localhost:3917/memories \
-d '{"content": "Always run tests before deploying", "tags": ["deploy"]}'
# Recall by meaning
curl -X POST http://localhost:3917/recall \
-d '{"query": "deployment checklist"}'
# Restore full context (session start)
curl http://localhost:3917/resume
Core Features
Most agent memory tools provide a vector store with search. engram adds a lifecycle — memories are not just stored, they are continuously managed.
LLM Quality Gate
New memories enter the Buffer layer. Promotion to Working or Core requires passing an LLM quality gate — the LLM evaluates each memory in context and determines whether it warrants long-term retention.
Buffer → [LLM gate: "Is this a decision, lesson, or preference?"] → Working
Working → [sustained access + LLM gate] → Core
Semantic Dedup & Merge
When two memories express the same concept in different words, engram detects and merges them:
Memory A: "use PostgreSQL for auth"
Memory B: "auth service runs on Postgres"
→ After consolidation: single merged memory preserving both contexts
Merging is LLM-powered — based on semantic understanding, not string similarity.
Automatic Decay
Decay is epoch-based — it only occurs during active consolidation cycles, not by wall-clock time. If the agent is idle for a week, memories remain intact.
| Kind | Decay rate | Use case |
|---|---|---|
episodic |
Fast | Events, experiences, time-bound context |
semantic |
Slow | Knowledge, preferences, lessons (default) |
procedural |
Slowest | Workflows, instructions, how-to |
Working and Core memories are never deleted. In the Working layer, importance decreases gradually but memories remain searchable. Buffer serves as a temporary staging area where all kinds may be evicted.
Self-Organizing Topic Tree
Vector clustering automatically groups related memories. The tree is hierarchical, with LLM-generated names:
Memory Architecture
├── Three-layer lifecycle [4]
├── Embedding pipeline [3]
└── Consolidation logic [5]
Deploy & Ops
├── CI/CD procedures [3]
└── Production incidents [2]
User Preferences [6]
The tree rebuilds automatically when memories change. At session start, the agent receives a topic index as a table of contents. Use POST /topic {"ids": ["kb3"]} to retrieve all memories within a specific cluster.
Triggers
Tag a memory with trigger:deploy, and it surfaces automatically when the agent queries /triggers/deploy before executing a deployment.
# Store a lesson
curl -X POST http://localhost:3917/memories \
-d '{"content": "LESSON: always backup DB before migration", "tags": ["trigger:deploy", "lesson"]}'
# Pre-deployment check
curl http://localhost:3917/triggers/deploy
# → returns all memories tagged trigger:deploy, ranked by access count
Architecture
Memory is organized along two dimensions — time and space:
Time (lifecycle) Space (topic tree)
┌─────────────────────────────┐ ┌──────────────────────────────┐
│ │ │ Auth Architecture │
│ Buffer → Working → Core │ │ ├── OAuth2 migration [3] │
│ ↓ ↓ ↑ │ │ └── Token handling [2] │
│ evict decay gate │ │ Deploy & Ops │
│ │ │ ├── CI/CD procedures [3] │
│ │ │ └── Rollback lessons [2] │
└─────────────────────────────┘ │ User Preferences [6] │
└──────────────────────────────┘
Time — a three-layer lifecycle inspired by the Atkinson–Shiffrin memory model:
| Layer | Role | Behavior |
|---|---|---|
| Buffer | Short-term staging | All new memories enter here. Evicted when they fall below threshold |
| Working | Active knowledge | Promoted by consolidation. Never deleted — importance decays at different rates by kind |
| Core | Long-term identity | Promoted through LLM quality gate. Never deleted |
Space — a self-organizing topic tree built from embedding vectors. Related memories cluster by semantic similarity, with LLM-generated names for each cluster:
| Mechanism | Description |
|---|---|
| Vector clustering | Groups semantically similar memories into topics via cosine similarity |
| Hierarchy | Related topics nest under shared parent nodes, forming a multi-level tree |
| LLM naming | Generates human-readable names for each cluster automatically |
| Auto-rebuild | Tree updates when memories change — no manual maintenance required |
Topic trees address a fundamental limitation of vector search: it requires the right query to find the right memory. Topic trees allow the agent to browse by subject — scan the directory, then drill into the relevant branch.
Session Recovery
A single call restores full context, intended for session start or post-compaction recovery:
GET /resume →
=== Core (24) ===
deploy: test → build → stop → start (procedural)
LESSON: never force-push to main
...
=== Recent ===
[02-27 14:15] switched auth to OAuth2
[02-27 11:01] published API docs
=== Topics (Core: 24, Working: 57, Buffer: 7) ===
kb1: "Deploy Procedures" [5]
kb2: "Auth Architecture" [3]
kb3: "Memory Design" [8]
...
=== Triggers ===
deploy, git-push, database-migration
Four sections, each serving a distinct purpose:
| Section | Content | Budget |
|---|---|---|
| Core | Full text of permanent rules and identity — never truncated | ~2k tokens |
| Recent | Memories changed since last consolidation window, for short-term continuity | ~1k tokens |
| Topics | Named topic index — structured directory of all memories | Leaf list |
| Triggers | Pre-action safety tags for automatic lesson recall | Tag list |
The agent reads the topic index, identifies relevant topics, and drills in via POST /topic on demand. This avoids loading the entire memory store into context.
Search & Retrieval
Hybrid retrieval combining semantic embeddings and BM25 keyword search (with jieba for CJK tokenization). Results are ranked by relevance, memory importance, and recency.
# Semantic search with budget control
curl -X POST http://localhost:3917/recall \
-d '{"query": "how do we handle auth", "budget_tokens": 2000}'
# Pre-action safety check
curl http://localhost:3917/triggers/deploy
# Topic drill-down
curl -X POST http://localhost:3917/topic \
-d '{"ids": ["kb3"]}'
Background Maintenance
Fully autonomous and activity-driven — cycles are skipped when there has been no write activity:
Consolidation (every 30 minutes)
Each cycle executes the following steps in order:
- Decay — reduce importance of unaccessed memories
- Dedup — detect and merge near-identical memories (cosine > 0.78)
- Triage — LLM categorizes new Buffer memories for promotion
- Gate — LLM evaluates promotion candidates (batched, single call)
- Reconcile — LLM resolves ambiguous similar pairs; results are cached to avoid redundant calls
- Topic tree rebuild — re-cluster and name new or changed topics
Topic Distillation
When a topic cluster grows too large (10+ memories), engram condenses overlapping memories into fewer, richer entries — preserving all specific details while reducing redundancy. Up to 2 topics are distilled per consolidation cycle.
Multi-Agent & Namespace Isolation
A single engram instance can serve multiple agents concurrently. SQLite WAL mode, a connection pool, and an RwLock-protected vector index make concurrent reads and writes safe out of the box.
Use the X-Namespace header to give each agent (or project) its own isolated memory space:
# Project-specific memories
curl -X POST http://localhost:3917/memories \
-H "X-Namespace: my-project" \
-d '{"content": "API uses OAuth2 bearer tokens"}'
# Cross-project knowledge in the default namespace
curl -X POST http://localhost:3917/memories \
-d '{"content": "Always use UTC for timestamps"}'
Install
# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/kael-bit/engram-rs/main/install.sh | bash
# Windows (PowerShell)
irm https://raw.githubusercontent.com/kael-bit/engram-rs/main/install.ps1 | iex
Docker
docker run -d --name engram \
-p 3917:3917 \
-v engram-data:/data \
-e ENGRAM_LLM_URL=https://api.openai.com/v1 \
-e ENGRAM_LLM_KEY=sk-... \
ghcr.io/kael-bit/engram-rs:latest
LLM Configuration
engram functions without an LLM — providing keyword search and rule-based lifecycle only. Adding an LLM enables intelligent features (triage, gate, merge, topic naming, audit):
export ENGRAM_LLM_URL=https://api.openai.com/v1
export ENGRAM_LLM_KEY=sk-...
# Any OpenAI-compatible API (Ollama, LM Studio, etc.)
export ENGRAM_LLM_URL=http://localhost:11434/v1
# Anthropic native
export ENGRAM_LLM_PROVIDER=anthropic
export ENGRAM_LLM_KEY=sk-ant-...
Two-tier model configuration — strong model for judgment tasks, lightweight model for text processing:
ENGRAM_GATE_MODEL=gpt-4o # quality decisions
ENGRAM_LLM_MODEL=gpt-4o-mini # text processing
For AI Agents
Add this to your agent's system prompt or session:
Set up engram (persistent memory) by following the guide at:
https://raw.githubusercontent.com/kael-bit/engram-rs/main/docs/SETUP.md
Integration
Compatible with Claude Code, Cursor, Windsurf, OpenClaw, and any MCP-compatible tool.
17 MCP tools — see MCP docs. Full HTTP API — see Setup guide.
# MCP (Claude Code)
npx engram-rs-mcp
# MCP (Cursor / Windsurf / generic)
# Add to your MCP config:
{"mcpServers": {"engram": {"command": "npx", "args": ["-y", "engram-rs-mcp"]}}}
Web Dashboard
Built-in web UI at http://localhost:3917 for browsing memories, viewing the topic tree, monitoring LLM usage, and inspecting consolidation history.
Specs
| Binary size | ~10 MB |
| Memory usage | ~100 MB RSS in production |
| Storage | SQLite, no external database |
| Language | Rust |
| Platforms | Linux, macOS, Windows (x86_64 + aarch64) |
| License | MIT |
License
MIT