achreftlili

code-index

Community achreftlili
Updated

code-index

A local, SQLite-backed code index for Claude Code, exposed over MCP. Itreplaces blind Read / Grep / Glob exploration with targeted retrieval —"where is parseAuthToken defined", "what calls Indexer.reindex_all", "findthe rate-limiting code" — answered in milliseconds against an offline index.

No API keys. No external services. The embedder runs locally on your machine.

How it works (30-second tour)

  1. Parse your repo with tree-sitter (Python, TypeScript/JavaScript, Go, Rust).
  2. Chunk code per symbol and expand identifiers (getUserAuthTokenget user auth token) so search matches both styles.
  3. Embed each chunk locally with jina-embeddings-v2-base-code (768-dim) via sentence-transformers.
  4. Store symbols, chunks, vectors, and call/import edges in .claude/index.db (SQLite + sqlite-vec + FTS5).
  5. Serve 14 retrieval tools + 1 admin tool over MCP (see Tools).
  6. Stay fresh via an optional PostToolUse hook that incrementally re-indexes touched files.

Tools

Retrieval

Tool Purpose
code_search Hybrid (vector + FTS) search for conceptual queries (e.g., "auth flow", "where do we parse JSON").
symbol_lookup Exact-name lookup of functions / classes / methods / types. Prefer over code_search for identifiers.
file_outline Symbols (with signatures) in a file, in source order. Use instead of Read when you only need shape.
module_outline Symbols across a directory subtree in one call. Use instead of looping file_outline.
where_am_i Given path + line, returns the innermost symbol and the full enclosing chain.
get_symbol_body Full chunk for a symbol_id from symbol_lookup / code_search / file_outline.
get_symbol_bodies Batch version of get_symbol_body (up to 20 ids per call).
callers Symbols that CALL the given symbol. depth (1-5) expands transitively.
callees Symbols that the given symbol CALLS. depth (1-5) expands transitively.
references Non-call uses (subclasses, free identifier references). Companion to callers / callees.
trace Build a call-graph tree from an entry symbol; flat=true returns nodes/edges for cheap LLM scans.
file_imports Files this file imports (direction=imports) or that import it (direction=imported_by).
recent_changes Files touched in the last N git commits.
propose_rename v1: same-file rename. Returns an edit list the agent applies via its own Edit tool; refuses on clash.

Admin

Tool / op Purpose
admin op=init Build or refresh the index. Incremental by default; force=true rebuilds from scratch.
admin op=setup_check Diagnose hook wiring + embedder + host. Round-trip-tests the hook end-to-end.
admin op=install_hook Wire the auto-reindex PostToolUse hook into .claude/settings.json. Idempotent.
admin op=stats Read-only: file counts by language, symbol totals, embed model fingerprint, last-index time.
admin op=verify Integrity sweep: orphan rows, parse-failure files, dangling edges.

embed_query_debug is a dev-only ranking diagnostic, hidden from list_toolsunless CODE_INDEX_DEBUG=1 is set.

All tools return bounded JSON; large bodies use get_symbol_body rather thaninlining whole files.

Requirements

  • Python 3.10+ with loadable SQLite extension support (required by sqlite-vec).
    • Python 3.13 has this enabled by default.
    • On 3.10–3.12, install via the python.org installer or via pyenv withPYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions pyenv install 3.12.x.
    • Homebrew Python often ships without the extension hook — use one of thetwo methods above instead.
  • uv / uvx (install) — recommended runner. Or pip if you prefer a permanent install.
  • ~600 MB free disk for the embedding model on first init.

Quick start (Claude Code)

One command, no API keys:

claude mcp add-json -s user code-index "$(cat <<'JSON'
{
  "type": "stdio",
  "command": "uvx",
  "args": ["--refresh", "--from", "mcp-code-index", "code-index-mcp"]
}
JSON
)"

Then open Claude Code in any repo and ask:

"Build the code index for this repo."

Claude calls the init MCP tool, which writes .claude/index.db. From then on,ask things like "where is parseAuthToken defined?" or "what callsIndexer.reindex_all?" — Claude routes them through symbol_lookup /callers / code_search instead of grepping.

What --refresh does — fetches the latest PyPI release on every ClaudeCode launch. Convenient during preview; drop it once you want to pin aversion (saves ~1s of startup).

Project-only install — drop -s user to register the server in thecurrent project's .claude/settings.json instead of the global ~/.claude.json.

First-run model download — the first init pullsjina-embeddings-v2-base-code (~600 MB) into ~/.cache/huggingface andcaches it forever. Subsequent runs are fully offline. If your networkblocks Hugging Face, pre-warm the cache from a machine that has access.

Already installed without --refresh? Run claude mcp remove code-indexfirst, then re-run the command above.

Alternative: permanent install (no uvx)

pip install mcp-code-index
claude mcp add -s user code-index -- code-index-mcp

Optional: keep the index live as you edit

Without a hook, the index drifts when files change outside the agent (mv,git checkout, IDE saves) until you call init again. With one, everyEdit / Write / MultiEdit Claude performs triggers an incremental reindexof the touched file.

Easiest path: ask Claude. On first use in a new project, ask "set up thecode-index" — Claude calls setup_checkinstall_hookinit. The hookcommand is derived from how the MCP server was launched (uvx-aware), so ituses the same Python toolchain. Hook output goes to .claude/code-index-hook.logso failures are debuggable.

Manual install — add this block to the project's .claude/settings.jsonunder hooks.PostToolUse (the version you want depends on how you launch theserver — install_hook derives the right one for you):

{
  "matcher": "Edit|Write|MultiEdit",
  "hooks": [
    {
      "type": "command",
      "command": "uvx --with 'sentence-transformers<5' --with 'numpy<2' --from mcp-code-index code-index-hook"
    }
  ]
}

In other MCP-compatible agents

The server speaks standard MCP over stdio, so any client that supports MCPservers works (Cursor, Continue, Cody, Zed, etc.). Configure the client tolaunch uvx --refresh --from mcp-code-index code-index-mcp (orcode-index-mcp after pip install mcp-code-index). Once connected, call theinit tool from inside the client to bootstrap the index. Drop --refreshwhen you want to pin to a stable version instead of always pulling latest.

From source (development)

git clone https://github.com/achreftlili/code-index
cd code-index
pip install -e .
code-index init        # CLI alternative to the `init` MCP tool
code-index-mcp         # starts the MCP server on stdio (for manual wiring)

Configuration

All settings are optional — the defaults work out of the box. Override them viaenvironment variables. Inside Claude Code, set them in the env block of yourcode-index server entry in ~/.claude.json (then reconnect the MCP server).

Common knobs (most users only ever touch these):

Var Default When to set it
CODE_INDEX_EMBED_DEVICE auto Force the torch device: cpu, mps, or cuda. Set cpu on Apple Silicon if init fails with MPS out-of-memory.
CODE_INDEX_EMBED_BATCH 32 Encode batch size. Lower (e.g. 8 or 4) to cut peak GPU memory while staying on mps/cuda.
CODE_INDEX_DB .claude/index.db Override the SQLite index path (e.g. to share an index across sibling worktrees).

Advanced (rarely needed):

Var Default Notes
CODE_INDEX_EMBEDDER jina Only jina (local sentence-transformers) is supported today; the variable exists for future expansion.
CODE_INDEX_EMBED_MODEL jinaai/jina-embeddings-v2-base-code HuggingFace model id. Only override if you know the model is dim-compatible (768d).
CODE_INDEX_EMBED_DIM 768 Must match the embedding model's output dimension.

Troubleshooting

init fails with MPS backend out of memory on Apple Silicon. A largefile produced a chunk batch bigger than your GPU's free VRAM. Quickest fix —re-run on CPU (slower but bulletproof):

"env": {
  "CODE_INDEX_EMBED_DEVICE": "cpu"
}

To stay on the GPU, shrink the batch instead: "CODE_INDEX_EMBED_BATCH": "8".Reconnect the MCP server (/mcp → reconnect, or restart Claude Code) so thenew env takes effect. init is incremental — already-embedded files areskipped on the retry.

init fails with a Hugging Face network error on first run. Your networkis blocking model downloads. Pre-warm the cache on a machine that has access:

huggingface-cli download jinaai/jina-embeddings-v2-base-code
# then copy ~/.cache/huggingface/ to the offline machine

sqlite3.OperationalError: not authorized or sqlite-vec fails to load.Your Python build doesn't have loadable SQLite extensions. SeeRequirements — install via python.org or a pyenv build withPYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions.

code_search / symbol_lookup returns stale paths after a refactor orbranch checkout. The auto-reindex hook only fires on Claude's Edit /Write / MultiEdit. After bulk file moves outside the agent (mv,git checkout, IDE rename), re-run init (it's incremental). Or wire up thehook so the index keeps up withagent edits automatically.

Layout

src/code_index/
  db.py           SQLite schema, connection, sqlite-vec loading
  parser.py       Tree-sitter wrapper, symbol + edge extraction
  imports.py      Per-language import target → file path resolution
  chunker.py      Per-symbol chunks, identifier expansion
  embedder.py     Local Jina (sentence-transformers) backend
  indexer.py      Pipeline: walk → parse → chunk → embed → write
  reindexer.py    Per-root engine cache; one entry point for "reindex one file"
  retriever.py    Hybrid search (vector + FTS5) with RRF
  watcher.py      File watcher (watchdog)
  admin.py        setup_check / install_hook / init logic (pure, no MCP state)
  mcp_server.py   MCP wiring, shared helpers, schema fragments
  tool_registry.py  Shared `@_tool` decorator + `_TOOLS` registry
  tools/          Per-domain MCP handlers (graph, paths, refactor, …)
  hook.py         `code-index-hook` console script — the PostToolUse entry point
  cli.py          init / reindex / watch / stats

MCP Server · Populars

MCP Server · New