code-index
A local, SQLite-backed code index for Claude Code, exposed over MCP. Itreplaces blind Read / Grep / Glob exploration with targeted retrieval —"where is parseAuthToken defined", "what calls Indexer.reindex_all", "findthe rate-limiting code" — answered in milliseconds against an offline index.
No API keys. No external services. The embedder runs locally on your machine.
How it works (30-second tour)
- Parse your repo with tree-sitter (Python, TypeScript/JavaScript, Go, Rust).
- Chunk code per symbol and expand identifiers (
getUserAuthToken→get user auth token) so search matches both styles. - Embed each chunk locally with
jina-embeddings-v2-base-code(768-dim) via sentence-transformers. - Store symbols, chunks, vectors, and call/import edges in
.claude/index.db(SQLite + sqlite-vec + FTS5). - Serve 14 retrieval tools + 1 admin tool over MCP (see Tools).
- Stay fresh via an optional
PostToolUsehook that incrementally re-indexes touched files.
Tools
Retrieval
| Tool | Purpose |
|---|---|
code_search |
Hybrid (vector + FTS) search for conceptual queries (e.g., "auth flow", "where do we parse JSON"). |
symbol_lookup |
Exact-name lookup of functions / classes / methods / types. Prefer over code_search for identifiers. |
file_outline |
Symbols (with signatures) in a file, in source order. Use instead of Read when you only need shape. |
module_outline |
Symbols across a directory subtree in one call. Use instead of looping file_outline. |
where_am_i |
Given path + line, returns the innermost symbol and the full enclosing chain. |
get_symbol_body |
Full chunk for a symbol_id from symbol_lookup / code_search / file_outline. |
get_symbol_bodies |
Batch version of get_symbol_body (up to 20 ids per call). |
callers |
Symbols that CALL the given symbol. depth (1-5) expands transitively. |
callees |
Symbols that the given symbol CALLS. depth (1-5) expands transitively. |
references |
Non-call uses (subclasses, free identifier references). Companion to callers / callees. |
trace |
Build a call-graph tree from an entry symbol; flat=true returns nodes/edges for cheap LLM scans. |
file_imports |
Files this file imports (direction=imports) or that import it (direction=imported_by). |
recent_changes |
Files touched in the last N git commits. |
propose_rename |
v1: same-file rename. Returns an edit list the agent applies via its own Edit tool; refuses on clash. |
Admin
| Tool / op | Purpose |
|---|---|
admin op=init |
Build or refresh the index. Incremental by default; force=true rebuilds from scratch. |
admin op=setup_check |
Diagnose hook wiring + embedder + host. Round-trip-tests the hook end-to-end. |
admin op=install_hook |
Wire the auto-reindex PostToolUse hook into .claude/settings.json. Idempotent. |
admin op=stats |
Read-only: file counts by language, symbol totals, embed model fingerprint, last-index time. |
admin op=verify |
Integrity sweep: orphan rows, parse-failure files, dangling edges. |
embed_query_debug is a dev-only ranking diagnostic, hidden from list_toolsunless CODE_INDEX_DEBUG=1 is set.
All tools return bounded JSON; large bodies use get_symbol_body rather thaninlining whole files.
Requirements
- Python 3.10+ with loadable SQLite extension support (required by
sqlite-vec).- Python 3.13 has this enabled by default.
- On 3.10–3.12, install via the python.org installer or via pyenv with
PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions pyenv install 3.12.x. - Homebrew Python often ships without the extension hook — use one of thetwo methods above instead.
uv/uvx(install) — recommended runner. Orpipif you prefer a permanent install.- ~600 MB free disk for the embedding model on first init.
Quick start (Claude Code)
One command, no API keys:
claude mcp add-json -s user code-index "$(cat <<'JSON'
{
"type": "stdio",
"command": "uvx",
"args": ["--refresh", "--from", "mcp-code-index", "code-index-mcp"]
}
JSON
)"
Then open Claude Code in any repo and ask:
"Build the code index for this repo."
Claude calls the init MCP tool, which writes .claude/index.db. From then on,ask things like "where is parseAuthToken defined?" or "what callsIndexer.reindex_all?" — Claude routes them through symbol_lookup /callers / code_search instead of grepping.
What
--refreshdoes — fetches the latest PyPI release on every ClaudeCode launch. Convenient during preview; drop it once you want to pin aversion (saves ~1s of startup).Project-only install — drop
-s userto register the server in thecurrent project's.claude/settings.jsoninstead of the global~/.claude.json.First-run model download — the first
initpullsjina-embeddings-v2-base-code(~600 MB) into~/.cache/huggingfaceandcaches it forever. Subsequent runs are fully offline. If your networkblocks Hugging Face, pre-warm the cache from a machine that has access.Already installed without
--refresh? Runclaude mcp remove code-indexfirst, then re-run the command above.
Alternative: permanent install (no uvx)
pip install mcp-code-index
claude mcp add -s user code-index -- code-index-mcp
Optional: keep the index live as you edit
Without a hook, the index drifts when files change outside the agent (mv,git checkout, IDE saves) until you call init again. With one, everyEdit / Write / MultiEdit Claude performs triggers an incremental reindexof the touched file.
Easiest path: ask Claude. On first use in a new project, ask "set up thecode-index" — Claude calls setup_check → install_hook → init. The hookcommand is derived from how the MCP server was launched (uvx-aware), so ituses the same Python toolchain. Hook output goes to .claude/code-index-hook.logso failures are debuggable.
Manual install — add this block to the project's .claude/settings.jsonunder hooks.PostToolUse (the version you want depends on how you launch theserver — install_hook derives the right one for you):
{
"matcher": "Edit|Write|MultiEdit",
"hooks": [
{
"type": "command",
"command": "uvx --with 'sentence-transformers<5' --with 'numpy<2' --from mcp-code-index code-index-hook"
}
]
}
In other MCP-compatible agents
The server speaks standard MCP over stdio, so any client that supports MCPservers works (Cursor, Continue, Cody, Zed, etc.). Configure the client tolaunch uvx --refresh --from mcp-code-index code-index-mcp (orcode-index-mcp after pip install mcp-code-index). Once connected, call theinit tool from inside the client to bootstrap the index. Drop --refreshwhen you want to pin to a stable version instead of always pulling latest.
From source (development)
git clone https://github.com/achreftlili/code-index
cd code-index
pip install -e .
code-index init # CLI alternative to the `init` MCP tool
code-index-mcp # starts the MCP server on stdio (for manual wiring)
Configuration
All settings are optional — the defaults work out of the box. Override them viaenvironment variables. Inside Claude Code, set them in the env block of yourcode-index server entry in ~/.claude.json (then reconnect the MCP server).
Common knobs (most users only ever touch these):
| Var | Default | When to set it |
|---|---|---|
CODE_INDEX_EMBED_DEVICE |
auto | Force the torch device: cpu, mps, or cuda. Set cpu on Apple Silicon if init fails with MPS out-of-memory. |
CODE_INDEX_EMBED_BATCH |
32 |
Encode batch size. Lower (e.g. 8 or 4) to cut peak GPU memory while staying on mps/cuda. |
CODE_INDEX_DB |
.claude/index.db |
Override the SQLite index path (e.g. to share an index across sibling worktrees). |
Advanced (rarely needed):
| Var | Default | Notes |
|---|---|---|
CODE_INDEX_EMBEDDER |
jina |
Only jina (local sentence-transformers) is supported today; the variable exists for future expansion. |
CODE_INDEX_EMBED_MODEL |
jinaai/jina-embeddings-v2-base-code |
HuggingFace model id. Only override if you know the model is dim-compatible (768d). |
CODE_INDEX_EMBED_DIM |
768 |
Must match the embedding model's output dimension. |
Troubleshooting
init fails with MPS backend out of memory on Apple Silicon. A largefile produced a chunk batch bigger than your GPU's free VRAM. Quickest fix —re-run on CPU (slower but bulletproof):
"env": {
"CODE_INDEX_EMBED_DEVICE": "cpu"
}
To stay on the GPU, shrink the batch instead: "CODE_INDEX_EMBED_BATCH": "8".Reconnect the MCP server (/mcp → reconnect, or restart Claude Code) so thenew env takes effect. init is incremental — already-embedded files areskipped on the retry.
init fails with a Hugging Face network error on first run. Your networkis blocking model downloads. Pre-warm the cache on a machine that has access:
huggingface-cli download jinaai/jina-embeddings-v2-base-code
# then copy ~/.cache/huggingface/ to the offline machine
sqlite3.OperationalError: not authorized or sqlite-vec fails to load.Your Python build doesn't have loadable SQLite extensions. SeeRequirements — install via python.org or a pyenv build withPYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions.
code_search / symbol_lookup returns stale paths after a refactor orbranch checkout. The auto-reindex hook only fires on Claude's Edit /Write / MultiEdit. After bulk file moves outside the agent (mv,git checkout, IDE rename), re-run init (it's incremental). Or wire up thehook so the index keeps up withagent edits automatically.
Layout
src/code_index/
db.py SQLite schema, connection, sqlite-vec loading
parser.py Tree-sitter wrapper, symbol + edge extraction
imports.py Per-language import target → file path resolution
chunker.py Per-symbol chunks, identifier expansion
embedder.py Local Jina (sentence-transformers) backend
indexer.py Pipeline: walk → parse → chunk → embed → write
reindexer.py Per-root engine cache; one entry point for "reindex one file"
retriever.py Hybrid search (vector + FTS5) with RRF
watcher.py File watcher (watchdog)
admin.py setup_check / install_hook / init logic (pure, no MCP state)
mcp_server.py MCP wiring, shared helpers, schema fragments
tool_registry.py Shared `@_tool` decorator + `_TOOLS` registry
tools/ Per-domain MCP handlers (graph, paths, refactor, …)
hook.py `code-index-hook` console script — the PostToolUse entry point
cli.py init / reindex / watch / stats