townsendmerino

ken

Community townsendmerino
Updated

Fast hybrid code search for agents. Pure Go, drop-in MCP-compatible with semble.

ken

Fast hybrid code search for agents. Pure Go, single static binary, drop-in MCP-compatible with MinishLab/semble — same tool schemas, same output format, install steps swapped to a Go binary.

CILicense: MITGo ReferenceGo 1.26+

ken is a Go port of semble: BM25 lexical + Model2Vec semantic embeddings + RRF fusion + a code-aware reranker, with the retrieval algorithm ported verbatim from semble's search.py + ranking/*.py.

Why ken

  • ~97% recall@10 in the default (hybrid) mode0.967 NL / 0.995 symbol on semble's 1,251-query benchmark, vs grep's ~99.9% — while costing an agent ~46× fewer tokens than grep + Read (4,120 vs 189,773 median tokens on NL queries — measured in the same default hybrid mode). For "find the chunk that answers this," that's a 1–2 order-of-magnitude token win at near-parity recall. (Reproduce: docs/BENCH.md.)
  • Single static binary. Pure Go, no cgo, no Python interpreter on cold start, no GIL on indexing. Cross-compiles to Linux / macOS / Windows (amd64/arm64) for free.
  • Drop-in for semble. Same search / find_related MCP tool schemas and the same markdown-string wire format — swap the command: path and existing agents work unchanged.
  • Local, CPU-only. Embedding inference, BM25, and fusion all run on the CPU. No API keys, no GPU, no vector DB, air-gapped friendly.

One knob controls recall. The 82–91% figure in the token-budget tables is the BM25-only fallback ken runs in when no embedding model is installed. ken-mcp fetches the model automatically on first run (~60 MB, pure-Go, no Python — serves bm25 until it lands, then upgrades to the ~97% hybrid path; KEN_MCP_AUTO_FETCH=0 to disable). For the CLI, run ken download-model once. Exhaustive enumeration (refactors, pre-rename audits) still belongs to grep; ken is for "find the chunk that answers this."

Where to start

  • ARCHITECTURE.md — current-state map: module layout, runtime/concurrency model, data flow, invariants. Start here for the code.
  • docs/USERS.md — agent users. Install ken-mcp, point your agent at it, use the nine tools. 5-minute on-ramp.
  • docs/DEVELOPERS.md — SDK authors and tuners. The mcp.Run embedded-corpus library, prebuilt indices, fs.FS indexing, custom chunkers, tuning rerank, performance expectations.
  • docs/DESIGN.md + docs/internal/DECISIONS.md — algorithm spec + every architectural decision (ADRs).
  • docs/BENCH.md — benchmark reproduction (NDCG, token-budget recall, the hybrid-vs-BM25 decomposition).

Quickstart

Install via a package manager:

# macOS / Linux (Homebrew) — installs both `ken` and `ken-mcp`:
brew install --cask townsendmerino/tap/ken
# Windows (Scoop):
scoop bucket add townsendmerino https://github.com/townsendmerino/scoop-bucket
scoop install ken

Or with Go:

# Install both binaries (Go 1.26+).
go install github.com/townsendmerino/ken/cmd/ken@latest
go install github.com/townsendmerino/ken/cmd/ken-mcp@latest

# Download the default Model2Vec model (~60 MB, one-time). Pure Go, no Python.
# (ken-mcp auto-fetches this on first run; the CLI needs it explicitly.)
# This is the single biggest retrieval-quality lever — it puts you on the ~97% path.
ken download-model

# Search any local repo from the CLI.
ken search /path/to/myrepo "save model to disk" --model ~/.ken/model

Or skip the model and use lexical-only mode (BM25-only costs ~14 pp recall@10 vs the hybrid default — see docs/BENCH.md):

ken search /path/to/myrepo "validateToken" --mode bm25

Pre-built binaries for macOS, Linux, and Windows (amd64/arm64) are attached to each release.tar.gz for macOS/Linux, .zip for Windows.

As of v0.3, ken index <path> defaults to watch mode — it stays alive and re-indexes on change (2 s debounce); --no-watch restores build-once-and-exit. ken-mcp always watches, so an agent editing the repo mid-session sees its own changes without a restart. ken also respects nested .gitignore files (per-directory, matching git).

Install as an MCP server

ken-mcp speaks JSON-RPC over stdio and serves the same two core tools (search, find_related) semble does, with the same arg shapes and markdown output.

# Claude Code
claude mcp add ken -s user -- /absolute/path/to/ken-mcp
// ~/.cursor/mcp.json  (or .cursor/mcp.json) — also .vscode/mcp.json with "servers"
{ "mcpServers": { "ken": { "command": "/absolute/path/to/ken-mcp" } } }
# ~/.codex/config.toml
[mcp_servers.ken]
command = "/absolute/path/to/ken-mcp"
// ~/.opencode/config.json
{ "mcp": { "ken": { "type": "local", "command": ["/absolute/path/to/ken-mcp"] } } }

Core environment variables

Variable Default Purpose
KEN_MCP_DEFAULT_REPO (unset) Pre-indexed source; lets tools omit the repo arg.
KEN_MCP_MODE hybrid bm25 / semantic / hybrid. Serves bm25 while the model is missing — fetched on first run by default (see KEN_MCP_AUTO_FETCH).
KEN_MCP_MODEL_DIR ~/.ken/model Path to a Model2Vec snapshot containing model.safetensors. Falls back to ~/.ken/model (where ken download-model writes) when unset.
KEN_MCP_AUTO_FETCH 1 On first run with a model-needing mode and no model present, fetch potion-code-16M (~60 MB) in the background, serving bm25 until it lands then upgrading to hybrid. 0 disables (serve bm25, warn).
KEN_MCP_CHUNKER regex regex / treesitter / line / markdown. See Choosing a chunker.
KEN_MCP_CACHE_SIZE 16 LRU bound on the repo→Index cache.
KEN_MCP_LOG_LEVEL warn debug / info / warn / error. All logs go to stderr; stdout is the JSON-RPC channel (details).
KEN_ALLOW_PRIVATE_CLONE_TARGETS 0 Off by default: for http(s) repo URLs, ken rejects loopback / link-local / RFC1918 addresses (SSRF guard). Set 1 to allow internal git hosts.

The full env reference — including the KEN_DB_* database variables — is in docs/USERS.md and docs/db-indexing.md. For agents that should route between ken and grep deliberately (rather than ken's default "prefer ken" instruction), see the routing snippet in docs/USERS.md.

Tools

Both core tools return a formatted markdown string identical to semble's _format_results output. (ken-mcp also exposes seven structural tools — definition, references, callers, outline, symbols, recently_changed, status — plus reindex_db when a database is configured; see docs/USERS.md.)

search

Arg Type Required Default Description
query string Natural language or code query.
repo string https:// / http:// URL or local directory. Required if no KEN_MCP_DEFAULT_REPO.
mode hybrid|semantic|bm25 hybrid Search mode.
top_k int 5 Number of results.

find_related

Arg Type Required Default Description
file_path string Path as it appears in a search result.
line int (1-indexed) A line inside the chunk to seed the similarity search.
repo string Same as for search.
top_k int 5 Number of similar chunks.

What ken indexes

ken's hybrid retrieval is calibrated for source code (Python / Go / TypeScript / Java / Rust have language-aware chunking; others fall back to the line chunker) and documentation (markdown chunked on heading boundaries, code blocks/tables kept atomic, frontmatter handled). Mixed code-and-docs corpora route per file by extension.

It also indexes database schemas alongside code — static .sql files (with migration-history folding) and live introspection of Postgres / SQLite / MySQL / MariaDB — so an agent answering "how do users get authenticated" gets the Go function, the SQL it runs, the users table definition, and the FK relationships in one ranked list. Full reference (Tier-1/Tier-2, row sampling, LISTEN/NOTIFY, the reindex_db tool, PII stance, all KEN_DB_* vars): docs/db-indexing.md.

For plain prose with no code or structured docs, BM25 mode (--mode=bm25) carries the load; the semantic model is code-trained and unvalidated on literary text.

How it works

gitignore-respecting walk
    → regex chunker (Python / Go / TS / Java / Rust) with line-chunker fallback
    → BM25 (Lucene variant, k1=1.5, b=0.75)  +  Model2Vec semantic (cosine over a dense matrix)
    → α-weighted RRF fusion (α auto-detected: 0.3 for symbol queries, 0.5 for NL)
    → file-coherence boost + query-type boosts (definition / embedded-symbol / stem-match)
    → path penalties (test files, compat / legacy, `.d.ts`) + file-saturation decay
    → top-k

The retrieval algorithm is a verbatim port of semble's search.py + ranking/*.py; see docs/DESIGN.md §7 for every constant and pipeline-order subtlety, and §4 for the Model2Vec inference contract (three-tensor safetensors, the mapping[] indirection, the float64 precision that's load-bearing for cosine parity).

Comparison to semble

Property semble ken
Language / distribution Python · uvx / pip Go · single static binary
Cold start ~500 ms (interpreter + numpy + model) ~10–20 ms ken search over a tiny index
Retrieval algorithm reference implementation verbatim port (constants + pipeline order from search.py + ranking/*.py)
NDCG@10 on semble's benchmark 0.854 0.842 hybrid (gap 0.012, full 63 repos × 1,251 queries)
Recall@10 on agent queries (not measured) ~0.97 hybrid (0.967 NL / 0.995 symbol); BM25-only fallback ~0.84
Tokens to recall@10 (not measured) ~46× fewer than grep+Read on NL queries (4,120 vs 189,773 median, hybrid)
MCP server yes yes — drop-in (same schemas + wire format)
Binary size n/a release (slim) ken ~22 MB · ken-mcp ~38 MB
Requires huggingface-cli yes noken download-model fetches direct from HF

Full methodology, the per-ablation breakdown (semantic-raw matches semble within 0.003, validating the embedding + tokenizer + ANN port), the CoIR-CSN-Python external anchor, and every footnote are in docs/BENCH.md.

Compared to other agent code-search tools

The crowded part of this category splits on one axis: what you have to run. ken's bet is that the embedding model belongs inside the binary — pure-Go Model2Vec inference, no cgo — so there's nothing else to stand up: no embedding daemon, no vector database, no API key, air-gapped. The two closest points of comparison:

  • grepai — the closest architectural analog: a single Go binary with a file watcher and an MCP server, 100% local. It offloads embeddings to a separate Ollama server (you install + run Ollama and pull a model).
  • claude-context (Zilliz) — the most visible: hybrid BM25 + dense search, but backed by a vector database (self-hosted Milvus via Docker, or managed Zilliz Cloud) and an embedding provider (OpenAI / VoyageAI / Gemini API, or local Ollama).
ken grepai claude-context
Runtime single static Go binary (no cgo) single Go binary Node/TS (npm)
Embeddings in-process, pure Go (Model2Vec) external Ollama daemon external provider (OpenAI / Voyage / Gemini, or Ollama)
External services needed none — auto-fetches a ~60 MB model, then runs offline Ollama (daemon + model) vector DB (Milvus/Docker or Zilliz Cloud) + an embedding API/daemon
Retrieval BM25 + dense + RRF + code-aware rerank dense + call graphs hybrid (BM25 + dense)
Recall / NDCG 0.967 recall@10 · 0.842 NDCG@10, with a reproduction harness not published not published
Token savings ~46× vs grep+Read, measured + reproducible not published vendor-claimed −39% vs a baseline
Speed index ~1.6 s / 13 k chunks; hybrid search p50 ~1.5 ms (measured) vendor: "10 k files in seconds, ms queries" depends on the vector DB + network
Languages (structural) 13 (tree-sitter) 10 chunk-level, language-agnostic
License MIT MIT MIT

Two honest caveats. First, ken's numbers ship with reproduction commands (docs/BENCH.md); the cells marked "not published" mean we found no standard-benchmark figure to cite and have not independently benchmarked the others' speed — architecture, dependencies, and license are the verifiable axes (as of June 2026). Second, the tools optimize for different things — grepai adds call-graph tracing; claude-context leans on a managed vector DB for scale-out. ken's specific claim is near-grep recall at ~1–2 orders of magnitude fewer tokens, from one binary with no external services, every number reproducible.

Choosing a chunker

The default regex chunker handles most cases well. The opt-in treesitter chunker (--chunker=treesitter / KEN_MCP_CHUNKER=treesitter, pure-Go gotreesitter) measurably wins for Kotlin, Zig, TypeScript, Java, PHP and loses on Python, C, Rust, Lua, Scala — net Δ −0.004 NDCG overall (within noise), so it stays opt-in. The full per-language recommendation table is in docs/BENCH.md; the default-stays-regex rationale is ADR-011.

For SDK authors: ship docs as a single binary

The mcp.Run library lets you bake a //go:embed corpus + the Model2Vec model into one static MCP server binary — no backend, no vector DB, no per-query network egress, version-pinned by build artifact. ~20 lines of main.go, go build, push to a GitHub release; users brew install and add one line to their agent config. The walker and indexer take any fs.FS (embed.FS, fstest.MapFS, tarball-backed), which also gives agent sandboxing by construction.

Full guide — the canonical pattern, prebuilt indices for fast cold start, the binary-size contract, and the opt-in mcp/db package — is in docs/DEVELOPERS.md.

Live demos (downloadable mcp.Run binaries over real codebases, with audit transcripts): demos/v0.1.0 release — Kubernetes v1.31.0 (59,795 chunks) and PostgreSQL 17.0 (64,506 chunks). Writeup: I shipped two downloadable code search binaries. The audit caught two bugs..

Roadmap

The risk register with explicit triggers is in docs/DESIGN.md §10; the living 1.0-readiness tracker is docs/internal/road-to-1.0.md. Retrieval is treated as closed for 1.0 (the relevance curve is flat); remaining work is polish + onboarding (getting fresh installs onto the hybrid path) + distribution.

How this was built

ken is a port. The retrieval algorithm is verbatim from MinishLab/semble (Python); the Go implementation was written by Claude under fixed constraints: pure Go / no cgo, algorithm constants ported verbatim and never tuned, original source wins whenever Claude's reconstruction diverges from semble's live code. That last rule caught five material errors during the rerank-pipeline port — each a confident-sounding hallucination that was wrong when checked against the Python source. The discipline of always checking, the verbatim-port rule, and the 11k-input tokenizer parity harness (which surfaced three bugs an 18-case spot-check missed) are human-supplied. Every architectural decision is recorded in docs/internal/DECISIONS.md.

Acknowledgments

ken stands on MinishLab's shoulders — the retrieval algorithm, the model, the whole embedding-table approach are theirs.

  • semble — the original Python implementation. © Thomas van Dongen, MIT.
  • model2vec — the static-embedding library whose three-tensor format ken implements. © Thomas van Dongen, MIT.
  • potion-code-16M — model weights, distilled from nomic-ai/CodeRankEmbed (MIT), itself from Snowflake/snowflake-arctic-embed-m-long (Apache-2.0). © Minish Lab. Redistributed per NOTICE.

License

ken is MIT-licensed. It bundles attribution for the redistributed model weights in NOTICE and a generated dependency-license list in THIRD_PARTY_LICENSES.md; every link in the provenance chain is permissive (MIT, Apache-2.0, MPL-2.0). See docs/DESIGN.md §6.

For contributors: CLAUDE.md has the build/test/formatting conventions and the project's invariants (precision contract, stdout/stderr contract).

MCP Server · Populars

MCP Server · New

    hosanxiv

    KiasuMiles

    Kiasu Miles — A Singapore credit card rewards optimization MCP. Real-time card matching, miles tracking, spend analysis.

    Community hosanxiv
    ndesv21

    SocialClaw

    Social media scheduling CLI and OpenClaw skill for AI agents posting to X, LinkedIn, Instagram, Facebook Pages, TikTok, Discord, Telegram, YouTube, Reddit, WordPress, and Pinterest.

    Community ndesv21
    stackql

    stackql

    Query, provision and operate Cloud and SaaS resources and APIs using an extensible SQL based framework

    Community stackql
    townsendmerino

    ken

    Fast hybrid code search for agents. Pure Go, drop-in MCP-compatible with semble.

    Community townsendmerino
    asklokesh

    Loki Mode

    Multi-agent autonomous SDLC framework. Spec to deployed app. PRD, GitHub issue, OpenAPI/JSON/YAML, or one-line brief. 5 AI providers, 11 quality gates.

    Community asklokesh