aimasteracc

๐ŸŒณ Tree-sitter Analyzer

Community aimasteracc
Updated

MCP code-intelligence server for AI agents โ€” beats CodeGraph on 6-repo benchmark median. 50 MCP tools, 13 curated skills, TOON output (50-70% token saving), 100% local. Python.

๐ŸŒณ Tree-sitter Analyzer

English | ๆ—ฅๆœฌ่ชž | ็ฎ€ไฝ“ไธญๆ–‡

The MCP code-intelligence server for AI agents โ€” fewer tokens, fewer tool calls, 100 % local.Pre-indexed AST cache + 62 MCP tools + 13 curated agent skills + TOON-compressed output.Beats CodeGraph on 6-repo head-to-head median (โˆ’11 % cost vs CodeGraph's โˆ’4 %), with a strict CLI superset.Now with BM25-ranked symbol search across all 62 tools โ€” results sorted by relevance, not file path.

PyPIPython VersionLicenseTestsCoverageGitHub Stars

Get Started

One-line install for Claude Code:

claude mcp add tree-sitter-analyzer \
  --env TREE_SITTER_PROJECT_ROOT="$PWD" \
  -- uvx --from "tree-sitter-analyzer[mcp]" tree-sitter-analyzer-mcp

Restart your agent, then say: "Set the project root to my repo and run codegraph_status."

Other agents (Cursor, Copilot, Cline, Continue, Claude Desktop, Roo Code) โ†’

Why Tree-sitter Analyzer

  • Token-efficient by default. Every MCP response uses TOON โ€” a tabular JSON variant that cuts payload by ~50-70 % vs raw JSON.
  • Verdict envelopes. Every response carries verdict: SAFE | CAUTION | UNSAFE | INFO | WARN | ERROR | NOT_FOUND, so orchestrators branch on outcomes without re-prompting.
  • Project health grading (Aโ€“F). No other open-source tool grades your whole project on size / complexity / coverage / duplication / dependencies / structure / git-hotspots in one call.
  • 13 curated workflows (Skills). Pre-baked tool subsets for "find symbol", "trace call chain", "score health", "safe-to-edit before refactor", "PR review", etc.
  • 5 layers of safety. safe_to_edit + modification_guard + constraint DSL + change_impact + verdict envelopes โ€” designed so agents know before they touch.
  • Beats the leading competitor (CodeGraph) on multiple head-to-head benchmarks. See below.

Benchmark Results

Headless Claude Code (Haiku 4.5) asked one architecture question per repo. 3 arms: no-MCP / CodeGraph MCP / Tree-sitter Analyzer MCP. Single run per arm โ€” indicative, not statistically settled.

Codebase Lang / files Baseline CodeGraph TSA Winner
Gin Go / 99 $0.164 $0.094 (โˆ’43 %) $0.080 (โˆ’51 %) TSA โญ
Alamofire Swift / 98 $0.201 $0.219 (+9 %) $0.147 (โˆ’27 %) TSA โญ
Excalidraw TS / 603 $0.204 $0.179 (โˆ’12 %) $0.212 (+4 %) CodeGraph
Django Py / 2 910 $0.162 $0.106 (โˆ’35 %) $0.205 (+27 %) CodeGraph
Tokio Rust / 778 $0.214 $0.285 (+33 %) $0.303 (+42 %) both lose
OkHttp Java / 596 $0.169 $0.200 (+18 %) $0.178 (+5 %) both lose
Median ฮ” vs baseline โˆ’4 % โˆ’11 % TSA

TSA wins outright on 2 of 6 repos, has a lower median cost saving (โˆ’11 %), and matches CodeGraph's reported direction on every repo where the indexer-class tools should help.

Why the median diverges from CodeGraph's published โˆ’35 % claim: we used Haiku for cost control; they used Opus + 4-run median. See docs/internal/CODEGRAPH_BENCHMARK_FINAL_2026-05-24.md for raw envelopes + reproducer scripts.

Post-benchmark improvements (2026-05-30): (1) BM25 pre-filter narrows 40k symbols to ~400 before cosine rerank โ€” a 133ร— speedup in semantic search. (2) Min-max BM25 normalization: relevance_score now properly differentiates strong matches (1.0) from weak (0.0) across all search paths. (3) semantic().sort(by='confidence') now works end-to-end. These improvements were not in the benchmark run; repos with large symbol counts (Django, Excalidraw) should see improved token efficiency in re-runs.

Key Features

Pre-indexed code intelligence (CodeGraph parity + superset)

Capability TSA tool Status
Symbol search (FTS5 + BM25 ranked) codegraph_symbol_search ahead โ€” results sorted by relevance score, not file path
Go-to-def / find-refs / call hierarchy in one call codegraph_navigate PRIMARY entry point
Bulk-fetch N related symbols + relationship map codegraph_explore parity
Function-level blast radius + risk score codegraph_impact parity + risk score
Who-calls-X / what-X-calls codegraph_callers / codegraph_callees parity
Index health at-a-glance (+ edge count) codegraph_status ahead โ€” reports total_edges for graph density signal
Pre-built call graph cache codegraph_autoindex / codegraph_full_index / codegraph_incremental_sync parity
Tests affected by a change (CLI) --affected FILE... parity

Tree-sitter Analyzer exclusive

Capability TSA tool Note
BM25-ranked symbol search all search tools relevance_score on every result (min-max normalized: best=1.0, weakest=0.0); sort(by='confidence') in DSL
Semantic search (133ร— faster) codegraph_query semantic() BM25 pre-filter narrows 40k symbols to ~400 before cosine rerank
Project Aโ€“F health grading check_project_health 7 dimensions (size/complexity/deps/coverage/duplication/structure/git-hotspot), no competitor offers this
TOON output every tool, output_format: "toon" (default) 50-70 % token saving
Verdict envelopes every tool SAFE/CAUTION/UNSAFE/INFO/WARN/ERROR/NOT_FOUND
Safe-to-edit gate safe_to_edit + modification_guard refuses high-risk edits before they happen
Architectural constraint DSL check_constraints "module A cannot import B" โ†’ enforced
Code health (file-level) check_file_health block/long-method/smell detection
Class hierarchy codegraph_class_hierarchy type-inheritance tree
Dependency matrix codegraph_dependency_matrix module-coupling matrix
Dead code codegraph_dead_code transitive unreachable analysis
Complexity heatmap codegraph_complexity_heatmap per-fn cyclomatic + project view
AST-structural clone detection codegraph_similarity beyond text similarity
Mermaid call-graph export codegraph_visualize paste-ready in docs
UML Mermaid export codegraph_uml class / package / component / sequence diagrams
PR review codegraph_pr_review AST-diff + semantic classify + blast radius
agent_summary every response next-step hint baked into the envelope
Synapse cross-file resolver internal import-aware, beats regex guessing
Temporal activation symbol_lineage per-symbol git-modification frequency
One-shot file orientation smart_context health + exports + deps + edit-risk in one call (replaces 3-4 calls)
Architectural decision journal decision_journal persists reasoning across sessions โ€” no competitor exposes this

Skills (13 curated workflows)

CodeGraph has zero skills. We ship 13 under .claude/skills/tsa-*/:

tsa-landing, tsa-find, tsa-graph, tsa-structure, tsa-deps, tsa-index, tsa-health-watch, tsa-edit-safety, tsa-edit-then-verify, tsa-constraints, tsa-pr-review, tsa-refactor-queue, tsa-temporal.

Each skill ships an allowed-tools subset + procedure recipe + decision-surface schema, so the agent doesn't have to triage 62 tools on every question.

255 CLI flags

Strict superset of CodeGraph's 15-command CLI. Highlights:

tree-sitter-analyzer --table full <file>          # method/signature/complexity table
tree-sitter-analyzer --partial-read --start-line N --end-line M <file>
tree-sitter-analyzer --project-health             # A-F grade across the project
tree-sitter-analyzer --callers <symbol>           # who-calls
tree-sitter-analyzer --codegraph-impact <fn>      # blast radius + risk
tree-sitter-analyzer --affected <file...>         # tests transitively affected
tree-sitter-analyzer --dead-code                  # transitive unreachable
tree-sitter-analyzer --check-constraints          # architectural rules
tree-sitter-analyzer --safe-to-edit <file>        # refuse if risky
tree-sitter-analyzer --uml class                  # Mermaid UML class diagram

See docs/CODEMAPS/cli.md for the full surface.

Quick Start

1. Install dependencies

# uv (required)
curl -LsSf https://astral.sh/uv/install.sh | sh        # macOS / Linux
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"  # Windows

# fd + ripgrep (required for search)
brew install fd ripgrep                                # macOS
winget install sharkdp.fd BurntSushi.ripgrep.MSVC      # Windows

2. Install Tree-sitter Analyzer

uv add "tree-sitter-analyzer[all,mcp]"

3. Hook it into your agent

See Supported Agents. Most clients want this MCP server entry:

{
  "mcpServers": {
    "tree-sitter-analyzer": {
      "command": "uvx",
      "args": ["--from", "tree-sitter-analyzer[mcp]", "tree-sitter-analyzer-mcp"],
      "env": { "TREE_SITTER_PROJECT_ROOT": "/absolute/path/to/your/project" }
    }
  }
}

After restart: "Set the project root to my repo and call codegraph_status."

How It Works

Source code โ†’ tree-sitter parse โ†’ SQLite + FTS5 index (.ast-cache/index.db)
                                         โ†“
        codegraph_navigate / codegraph_explore / codegraph_callers / ...
                                         โ†“
                            TOON-compressed envelope
                            (verdict + agent_summary + data)
                                         โ†“
                              MCP client / CLI consumer

The index is built lazily on first query, refreshed on file change via a content-hash diff (codegraph_incremental_sync). All 62 tools read from the same .ast-cache/, so a query and its follow-up share work.

Supported Agents

๐Ÿ“˜ Claude Code (recommended)
claude mcp add tree-sitter-analyzer \
  --env TREE_SITTER_PROJECT_ROOT="$PWD" \
  -- uvx --from "tree-sitter-analyzer[mcp]" tree-sitter-analyzer-mcp

Verify: claude mcp list. The 13 tsa-* skills auto-discover from .claude/skills/.

๐Ÿ“— Claude Desktop

Edit claude_desktop_config.json (macOS: ~/Library/Application Support/Claude/, Windows: %APPDATA%\Claude\, Linux: ~/.config/Claude/):

{
  "mcpServers": {
    "tree-sitter-analyzer": {
      "command": "uvx",
      "args": ["--from", "tree-sitter-analyzer[mcp]", "tree-sitter-analyzer-mcp"],
      "env": { "TREE_SITTER_PROJECT_ROOT": "/absolute/path/to/your/project" }
    }
  }
}
๐Ÿ“™ GitHub Copilot (VS Code)

Create .vscode/mcp.json (note: servers, not mcpServers):

{
  "servers": {
    "tree-sitter-analyzer": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--from", "tree-sitter-analyzer[mcp]", "tree-sitter-analyzer-mcp"],
      "env": { "TREE_SITTER_PROJECT_ROOT": "${workspaceFolder}" }
    }
  }
}
๐Ÿ–ฑ Cursor / Cline / Continue / Roo Code

All read the same mcpServers schema as Claude Desktop. Cursor: Settings โ†’ MCP. Cline: MCP panel โ†’ Edit settings. Continue: ~/.continue/config.json under experimental.modelContextProtocolServers. Roo Code: MCP panel โ†’ Edit MCP Settings.

โš ๏ธ TREE_SITTER_PROJECT_ROOT must be absolute. The server enforces a security boundary against escapes via SecurityBoundaryManager.

Supported Languages

21 language plugins; 13 fully wired into the indexer (full symbol + call graph) + 5 (data/markup) reachable via the single-file CLI path + 3 scaffold (plugin exists, indexer wiring pending). The 2026-05-24 patch unblocked Swift / Kotlin / Ruby / PHP / C# that had been silently skipped for months.

Tier Languages
Full index + symbol + call graph Python ยท Java ยท JavaScript ยท TypeScript ยท Go ยท Rust ยท C ยท C++ ยท C# ยท Swift ยท Kotlin ยท Ruby ยท PHP
Single-file analysis (CLI) HTML ยท CSS ยท Markdown ยท SQL ยท YAML
Scaffold (plugin exists, indexer wiring pending) bash ยท scala ยท json

CodeGraph supports a similar set; the only popular code languages neither tool ships yet are Dart, Vue, Svelte, Lua (next-sprint backlog).

Configuration

Mostly nothing. The defaults are designed so you can hook it into your agent and forget:

  • Output format: TOON. Override per-call with output_format: "json".
  • Project root: TREE_SITTER_PROJECT_ROOT (env var, MCP) or --project-root (CLI).
  • Cache location: <project>/.ast-cache/. Safe to delete โ€” auto-rebuilds.
  • Optional: TREE_SITTER_OUTPUT_PATH for large-output write target.

Quality & Testing

Metric Value
Tests passed 18,702 โœ…
Coverage Coverage
Type safety 100 % mypy
Platforms macOS ยท Linux ยท Windows
Pre-commit gates bandit ยท mypy ยท pyupgrade ยท detect-secrets ยท codemap-sync ยท smell-ratchet
uv run pytest -q                                # full suite
uv run python check_quality.py --new-code-only  # quality gate

Troubleshooting

Symptom Fix
unsupported language on .swift / .kt / .rb / .php / .cs Update to โ‰ฅ 1.12.x โ€” the 5-language gap was patched in commit 50e99a8f.
MCP server doesn't appear in client TREE_SITTER_PROJECT_ROOT must be absolute; restart the client after config edit.
database is locked Stop any other process holding .ast-cache/index.db; if persistent, rm -rf .ast-cache && tree-sitter-analyzer --autoindex.
Slow first call First call builds the index. Subsequent calls are sub-second. Run --full-index upfront to amortise.
Agent picks the wrong tool Use a tsa-* skill (/tsa-graph, /tsa-find, ...) โ€” each skill restricts the visible tool set to one workflow.

Development

git clone https://github.com/aimasteracc/tree-sitter-analyzer.git
cd tree-sitter-analyzer
uv sync --extra all --extra mcp
uv run pytest -q

See docs/CONTRIBUTING.md for the development guide.

Contributing & License

  • โญ A GitHub star helps surface this tool to other AI-agent users.
  • ๐Ÿ’– Sponsor โ€” supports continued MCP / Skills development.
  • Lead sponsor: @o93.
  • MIT licensed โ€” see LICENSE.
  • Release history: CHANGELOG.md.

MCP Server ยท Populars

MCP Server ยท New