soolaugust

vMem

Community soolaugust
Updated

Zero context compaction for Claude Code & LLM agents. Persistent memory powered by OS primitives (demand paging, kswapd eviction, mlock pinning). Single SQLite file, MCP-native, multi-agent shared.

vMem

Virtual memory for LLM context. For Claude Code and every AI agent.

Your AI never forgets — no more "context compacted" interruptions.

PythonSQLiteTestsLicenseDiscussions

English · 中文

One-line install via Claude Code:

/install-plugin github:soolaugust/vMem

The problem: context compaction kills your flow

If you use Claude Code, you know this pain:

⚠️ Auto-compact: conversation is approaching context limit...

Every time this happens, your AI loses track of decisions, constraints, and hard-won context. You re-explain. It re-learns. Hours of accumulated understanding — gone in one compaction event.

And if you run multiple agents? They can't share what they've learned. Each one starts from zero.

This isn't a model limitation. It's a missing infrastructure layer.

The solution: persistent context that survives compaction

vMem gives your AI agents persistent, retrievable context managed like virtual memory: the context window is the hot working set, and durable knowledge lives outside it until demand-paged back in.

The result: OS-managed context continuity. Your AI retains every decision, constraint, and lesson across sessions, across compactions, across agents.

How it works

You speak
  → vMem retrieves relevant memories → injects into context
  → AI responds with full context
  → Session ends → decisions and insights auto-extracted → persisted
  → Compaction happens? No problem — memories survive outside the window
  → Next session starts → working set restored automatically

The whole pipeline runs inside Claude Code hooks. There is no manual memory management.

Why "vMem"?

vMem is virtual memory for LLM context: instead of treating the context window as the whole world, it manages a working set with OS primitives.

What others see What vMem does
"Context compacted" Durable knowledge already lives outside the window
New session starts Working set auto-restored in <100ms
Multiple agents running All share one managed context substrate
Constraint decided 3 weeks ago Pinned with mlock-style semantics

OS-managed context. Durable working sets. No repeated explanation.

Under the hood: OS context management for AI

The secret sauce? We didn't invent new algorithms. We borrowed what the Linux kernel has been doing for 40 years:

OS concept vMem equivalent
RAM (working space) Context window — what the AI sees right now
Disk (persistent storage) Knowledge base — facts that survive across sessions
Demand paging On-demand retrieval — fetch relevant memories at the right moment
mlock Hard / soft pinning — guarantee a constraint is never evicted
kswapd watermarks Capacity-aware eviction under pressure
CRIU checkpoint / restore Session snapshots — pause and resume seamlessly
Process scheduling Multi-agent coordination — many agents, one knowledge base
kworker thread pool Async extraction — I/O off the critical path

How is this different from mem0 / Letta / Zep?

vMem mem0 Letta (MemGPT) Zep
Design metaphor OS-managed context Vector store Agent runtime Temporal graph
Context continuity ✅ pinned knowledge survives
Multi-agent shared ✅ native, single store ⚠️ via API
MCP-native ✅ first-class
Single-file deploy ✅ SQLite, no service ❌ needs server ❌ needs server ❌ needs server
Demand-paging retrieval ✅ explicit implicit implicit implicit
Eviction policy ✅ kswapd + DAMON TTL only recency recency + decay
Pin / mlock semantics

TL;DR. If you're tired of context compaction wiping your AI's memory, and you want a solution that's pip install, runs as a sidecar on a laptop, shares between several Claude Code / Cursor / custom agents, and never loses a pinned constraint — vMem is built for that.

Performance at a glance

Metric Value
Retrieval latency (P50, hot path) ~0.1 ms (540x faster than the 54 ms subprocess baseline)
Recall@3 vs baseline +147%
Cross-session recall 94.2%
Token cost per call ~44 tokens injected, +256 tokens net ROI (avoided re-explanation)
Test suite 3,500+ tests across retrieval, eviction, MCP, privacy filter

Quick start

One-line install (recommended).

/install-plugin github:soolaugust/vMem

Manual install.

git clone https://github.com/soolaugust/vMem
cd vMem
pip install -e .
mkdir -p ~/.claude/memory-os

Detailed Claude Code hook configuration, daemon management, and troubleshooting live in docs/SETUP.md.

Architecture

Three layers:

  1. Hooks — sit at the Claude Code syscall boundary (SessionStart, UserPromptSubmit, Stop, PostToolUse) and call into the store.
  2. Store — single SQLite file (WAL mode) with FTS5 full-text index, behind a unified VFS interface (store.py / store_vfs.py / store_criu.py).
  3. Daemons & IPC — persistent retriever daemon (Unix socket), async extractor pool (kworker-style), cross-agent notify bus.

For the full layered diagram, on-disk schema, and the rationale behind each subsystem, see docs/ARCHITECTURE.md. For the comprehensive OS-and-cognitive-science primitive mapping, see docs/DESIGN_PHILOSOPHY.md.

Roadmap

  • Distributed vMem — cgroup-style multi-agent quotas, network-replicated stores
  • Adaptive watermarks — eviction tuning that follows observed agent behavior
  • arXiv preprint — formal evaluation against mem0 / Letta / Zep
  • Per-chunk embedding routing — different models for code vs prose

What landed already (1,051+ tuning iterations, eight major capability rounds) is summarized in CHANGELOG.md. Pain points it has resolved along the way are in docs/PROBLEMS_SOLVED.md.

Testing

# stable test subset
python3 -m pytest tests/test_agent_team.py tests/test_chaos.py -q

Coverage: per-session DB isolation, concurrent-write safety, cross-agent IPC delivery, extractor-pool queue semantics, CRIU checkpoint validation, goals-progress idempotency.

Dependencies

No GPU. No external API. Everything runs locally.

Dependency Purpose
Python 3.12+ Core runtime
SQLite (built-in) Store + FTS5 full-text index
nc, flock Daemon socket + single-instance startup

Paper

📄 Beyond Eviction: Full OS Context-Management Semantics for LLM Agent Persistence (PDF, 8 pages)

Technical paper describing the complete OS→agent-context mapping: demand paging, kswapd, DAMON, mlock, CRIU, kworker, and shared memory.

Citation

@software{su2026compactmem,
  title = {vMem: Full OS Memory Semantics for LLM Agent Persistence},
  author = {Su, Zhidao},
  year = {2026},
  url = {https://github.com/soolaugust/vMem}
}

Contributing

Each subsystem hides behind a clean VFS interface, so components are testable in isolation. Issues, design proposals, and pull requests are welcome — see the Discussions tab for design questions, and please run the test subset above before submitting a PR.

Context compaction is the #1 productivity killer in Claude Code.vMem makes it a non-event.

English · 中文

MCP Server · Populars

MCP Server · New

    tsouth89

    Toolport

    Local-first MCP gateway. One port for every tool and every AI client: lazy discovery (~90% token savings), tool integrity + quarantine, secrets in the OS keychain.

    Community tsouth89
    Sendmux

    Email Inbox API + Sending by Sendmux

    Official monorepo of SDKs, CLI, and MCP servers for Sendmux email APIs across TypeScript, Python, Go, PHP, Rust, and Ruby.

    Community Sendmux
    ATH-MaaS

    🎨 Pixelle MCP - Omnimodal Agent Framework

    An Open-Source Multimodal AIGC Solution based on ComfyUI + MCP + LLM https://pixelle.ai

    Community ATH-MaaS
    cauldr0nx

    EspoCRM MCP Server

    Opensource MCP Server for EspoCRM

    Community cauldr0nx
    cisco-open

    Network Sketcher

    Network Sketcher is an AI-ready network design tool with Local MCP, Online, and Offline editions for creating network designs and exporting PowerPoint diagrams and Excel-based configuration data.

    Community cisco-open