_ _
__| | ___ __ _ __| |_______ _ __ ___
/ _` |/ _ \/ _` |/ _` |_ / _ \| '_ \ / _ \
| (_| | __/ (_| | (_| |/ / (_) | | | | __/
\__,_|\___|\__,_|\__,_/___\___/|_| |_|\___|
> semantic doc search. local file. no cloud. no key.
> you ask in english. it answers in snippets.
Status. Vector search wired end-to-end. MCP over stdio. One binary, Linux + macOS, zero telemetry. See releases for the latest tag and the roadmap for in-flight work.The scraper is still the messy half — #64 is honest about it.
The pitch, in one paragraph
Your AI client says "how do I register a tool?". The doc says AddTool. A grep-based index shrugs; a vector index doesn't. Deadzone is the vector index — nomic-embed-text-v1.5 over Turso's native cosine distance, wrapped in a Go binary that speaks MCP over stdio and keeps every byte on your laptop. It is, roughly, Context7 with the internet turned off.
Rules of the deadzone
- One binary.
deadzone. Subcommands for everything. Nopip install, nonpm i, nodocker compose up. - The index never leaves. Local Turso file. No account. No API key. No egress on the hot path.
- Natural language first. Embeddings over cosine.
FTS5is not invited. - The binary is the version. The DB is pinned to the binary. Upgrade the binary, the DB follows; don't, and it won't.
- Fail loudly or not at all.
DEADZONE_DB_OFFLINE=1refuses to guess. Verification failures in the scraper drop the doc, not the run.
Install (pick one; they all converge on the same binary)
# macOS Apple Silicon — the one-liner
brew install laradji/deadzone/deadzone
# Linux — resolve the latest tag once, then pick a flavor
VERSION=$(curl -fsSL https://api.github.com/repos/laradji/deadzone/releases/latest | grep '"tag_name"' | cut -d'"' -f4)
ARCH=amd64 # or arm64
# self-mounting AppImage
curl -L -O "https://github.com/laradji/deadzone/releases/download/${VERSION}/deadzone_${VERSION}_linux_${ARCH}.AppImage"
chmod +x "deadzone_${VERSION}_linux_${ARCH}.AppImage"
mv "deadzone_${VERSION}_linux_${ARCH}.AppImage" deadzone
# or plain tarball (no FUSE needed)
curl -L "https://github.com/laradji/deadzone/releases/download/${VERSION}/deadzone_${VERSION}_linux_${ARCH}.tar.gz" \
| tar xz --strip-components=1
Both flavors land a deadzone executable in your current directory, so the ./deadzone server snippet below works as-is.
# Container — multi-arch (linux/amd64 + linux/arm64), ships with the DB baked, runs offline by default
docker pull ghcr.io/laradji/deadzone:latest
docker run --rm -i ghcr.io/laradji/deadzone:latest server
The image bakes the binary, libonnxruntime, deadzone.db, and the nomic-embed-text-v1.5 ONNX weights (~230 MB total), and runs as a non-root user out of distroless (no shell, no package manager). DEADZONE_DB_OFFLINE=1 is set in the image so first launch is instant — no download, no volume mount, no --network access required. To refresh the index, pull a newer tag.
Windows is blocked upstream — no libtokenizers.a. Use WSL.
Verify checksums (optional but cheap):
curl -L -O "https://github.com/laradji/deadzone/releases/download/${VERSION}/deadzone_${VERSION}_checksums.txt"
sha256sum --ignore-missing -c "deadzone_${VERSION}_checksums.txt" # Linux
shasum -a 256 --ignore-missing -c "deadzone_${VERSION}_checksums.txt" # macOS
AppImage needs FUSE v2. Most desktops ship it; minimal servers don't. If you get dlopen(): libfuse.so.2, either apt-get install libfuse2 (or dnf install fuse-libs) or pass --appimage-extract-and-run to bypass FUSE entirely.
Run
./deadzone server
That's the quick-start. On first launch it fetches deadzone.db matched to this binary's version, SHA256-verifies, caches it under the platform data dir, and serves. Second launch onwards: zero network. Upgrade the binary and the DB re-fetches on next launch; don't, and the cache is served forever.
MCP client wire-up — native binary (Brew tap, tarball, or AppImage):
{
"mcpServers": {
"deadzone": {
"type": "stdio",
"command": "/path/to/deadzone",
"args": ["server"]
}
}
}
MCP client wire-up — container (multi-arch on ghcr.io). The image ships with deadzone.db baked, so no volume mount is needed and every container start is offline-instant:
{
"mcpServers": {
"deadzone": {
"type": "stdio",
"command": "docker",
"args": ["run", "--rm", "-i", "ghcr.io/laradji/deadzone:latest", "server"]
}
}
}
Then, from the client:
search_libraries("terraform aws") → ranked (lib_id, version) pairs
search_docs("creating an s3 bucket", lib_id=...) → snippets, token-budgeted
The two tools
┌─────────────────────────────────────────────────────────────────────┐
│ search_libraries(name, limit?) → []LibraryHit │
│ ───────────────────────────────────────────── │
│ free text ──► vector match against the `libs` table │
│ ──► [{lib_id, version, doc_count, match_score}] │
├─────────────────────────────────────────────────────────────────────┤
│ search_docs(query, lib_id?, version?, tokens?) → []Snippet │
│ ────────────────────────────────────────────────── │
│ natural ──► 768-dim embed ──► cosine over docs │
│ language ──► token-budgeted snippets back │
└─────────────────────────────────────────────────────────────────────┘
| Arg | Shape | Notes |
|---|---|---|
query |
string | Matched semantically. Don't write keywords; write what you want. |
lib_id |
/org/project |
Optional filter. Grab one from search_libraries. |
version |
"1.14" or similar |
Optional pin; requires lib_id. version alone is rejected. |
tokens |
int | Response budget. Default 5000, min 1000, ≈ 4 chars/token. |
limit |
int | On search_libraries — max results. Default 10, max 50. |
name |
string | Free text on search_libraries. Empty returns the most-indexed libs. |
Under the hood
deadzone server
│
▼
┌──────────────┐ stdio JSON-RPC ┌───────────────┐
│ MCP client │ ─────────────────────► │ handler │
└──────────────┘ └──────┬────────┘
│
┌──────────────────┴──────────────────┐
▼ ▼
┌────────────────┐ ┌──────────────────┐
│ embedder │ │ Turso (local) │
│ hugot + ORT │ │ F32_BLOB(768) │
│ nomic v1.5 │ │ vector_distance │
└────────┬───────┘ └──────────────────┘
│ 768-dim ▲
└────────────── query vector ────────┘
| Layer | Choice |
|---|---|
| Language | Go 1.26.2, pinned via mise |
| Storage | Turso local file — native F32_BLOB(N) + vector_distance_cos |
| Driver | tursogo — CGO-free via purego |
| Embedder | hugot → nomic-ai/nomic-embed-text-v1.5, 768-dim, 8192-token ctx (int8 quantized) |
| Runtime | ONNX Runtime — binary CGO-linked at build time; libonnxruntime auto-fetched + SHA256-verified on first launch |
| Protocol | modelcontextprotocol/go-sdk over stdio |
The binary itself is CGO-linked (hugot ORT backend + static libtokenizers.a). At runtime the only native surface is libonnxruntime, loaded via dlopen after a SHA256-verified auto-download. Everything else — Go stdlib, tursogo, the model weights — is either statically linked or fetched on first launch against a pinned hash. No system installs. No sudo. If a download drifts from its pinned hash, the run aborts; there is no fallback to an unverified fetch.
Escape hatches for air-gapped boxes:
| Env var | Effect |
|---|---|
DEADZONE_ORT_LIB_PATH |
Hand-positioned libonnxruntime path. Skips the download. |
DEADZONE_ORT_CACHE |
Override the ORT library cache dir. |
DEADZONE_HUGOT_CACHE |
Override the model-weights cache dir. |
DEADZONE_DB_CACHE |
Override the deadzone.db cache dir. |
DEADZONE_DB_OFFLINE=1 |
Refuse any network call. Fails loudly if nothing is cached. Set by default in the container image (which ships deadzone.db baked). |
DEADZONE_DB_AUTOUPDATE=0 |
Disable the boot-time DB freshness probe (the probe runs by default; fetch-db always probes regardless of this flag). |
Default cache paths per platform:
| Platform | deadzone.db lives at |
|---|---|
| macOS | ~/Library/Application Support/deadzone/deadzone.db |
| Linux | $XDG_DATA_HOME/deadzone/deadzone.db (else ~/.local/share/...) |
| Windows | %LOCALAPPDATA%\deadzone\deadzone.db |
A sibling deadzone.db.release JSON manifest records {tag, sha256, fetched_at}. Startup compares the cached tag against the binary's compiled-in version: match → fire a 3-second freshness probe against deadzone.db.sha256 on the matching GitHub Release, atomic-swap if the remote sha differs (soft-fail to the cache on any network error); differs → fetch the new tag's release and atomic-swap; dev build → fall back to /releases/latest with a server.db_version_dev_fallback WARN. Pre-#197 binaries wrote a single-line tag-only sidecar; the JSON reader still accepts that format and rewrites it to v1 on first probe.
Add a library
Contributor path. End users don't touch this — they just get what ships in deadzone.db.
Not editing YAML yourself? Open an issue via the New issue page and pick Add a library or Refresh a library. The template collects exactly what a registry entry needs.
Editing YAML yourself? Append to libraries_sources.yaml:
libraries:
# Single-version lib — no `versions` key, urls used as-is.
- lib_id: /modelcontextprotocol/go-sdk
kind: github-md
urls:
- https://raw.githubusercontent.com/modelcontextprotocol/go-sdk/main/README.md
- https://raw.githubusercontent.com/modelcontextprotocol/go-sdk/main/docs/quick_start.md
# Multi-version lib — `versions` expands into one effective lib_id
# per version (/org/project/1.4, /org/project/1.5, …). {ref} is
# substituted from each version's ref: field.
- lib_id: /modelcontextprotocol/go-sdk
kind: github-md
versions:
"1.4": { ref: v1.4.1 }
"1.5": { ref: v1.5.0 }
urls:
- https://raw.githubusercontent.com/modelcontextprotocol/go-sdk/{ref}/README.md
- https://raw.githubusercontent.com/modelcontextprotocol/go-sdk/{ref}/docs/getting-started.md
| Field | Req | Purpose |
|---|---|---|
lib_id |
yes | Canonical /org/project identifier (matches db.docs.lib_id). |
kind |
yes | github-md (raw markdown), github-rst (raw reStructuredText), or scrape-via-agent (HTML/text via LLM). |
urls |
yes | Doc URL list with an optional {ref} placeholder. |
versions |
no | {"1.4": {ref: v1.4.1, urls: [...]}, "1.5": {ref: v1.5.0}, …} — user-facing identifiers prefer major.minor. |
ref |
no | Git tag or commit SHA substituted into {ref}. Per-version ref: overrides top-level. |
versions[v].urls |
no | Per-version URL list — replaces baseline wholesale. Use for structurally divergent versions. |
Pre-1.0: no Go editing, no recompile. Just edit YAML and re-scrape.
Scrape-via-agent (experimental)
⚠️ The messy half. Works today for non-markdown sources (Terraform providers, mkdocs, GitBook, …), but the LLM→verifier loop is sensitive to input truncation (48 KiB cap), HTML→markdown skill, and verbatim-code matching. Real-world hit rate on dense doc sites ≈ 50%/URL — see #64. Prefer
github-mdwhenever the project ships committed markdown.
Bring your own LLM runtime — Ollama, llama.cpp, vLLM, LocalAI, LM Studio, Groq, OpenAI, anything that speaks POST /v1/chat/completions:
export DEADZONE_AGENT_ENDPOINT=http://localhost:11434/v1
export DEADZONE_AGENT_ENDPOINT_MODEL=qwen2.5:7b
export DEADZONE_AGENT_ENDPOINT_API_KEY=sk-... # optional
Then add a kind: scrape-via-agent entry to libraries_sources.yaml with a list of page URLs. The downstream pipeline (parse → chunk → embed → store) is identical to github-md; only the markdown source changes.
Guardrails. Every fenced code block in the LLM output is verified verbatim against the source — invented examples drop the doc (scraper.agent_verification_failed), not the run. Missing/unreachable endpoint aborts at startup; no silent fallback.
Local pipeline (contributors)
Two-step bootstrap: toolchain first, then the CGO native dep — kept separate so air-gapped / CI runners with vendored libtokenizers.a can skip step 2 by overriding DEADZONE_TOKENIZERS_LIB.
just bootstrap # Go 1.26.2 + just toolchain (mise install)
just fetch-tokenizers # libtokenizers.a — one-shot CGO setup
just build # CGO + ORT, all packages
just scrape # all libs — one artifact folder per lib
just scrape /hashicorp/terraform # one base lib, every version
just scrape /hashicorp/terraform/1.14 # one exact version
just consolidate # merge artifacts/*/artifact.db → deadzone.db
just serve # MCP server against deadzone.db
just with no args lists every recipe. Each scrape rewrites artifacts/<slug>/artifact.db + state.yaml in place; consolidate merges all artifact DBs atomically under deadzone.db. Per-lib folders are gitignored; the committed artifacts/manifest.yaml records release history only.
Full registry via CI. gh workflow run scrape-pack.yml -f tag=vX.Y.Z fans out the matrix, consolidates, and uploads deadzone.db to the tagged release. Omit -f tag=… to stop at a consolidated-db cache.
Release flow
Two-phase as of #101 — CI ships binaries, operator ships the DB.
# 1. Regenerate deadzone.db from the committed scraper config.
just scrape && just consolidate
# 2. Tag + push. CI's release.yml builds tarballs + AppImages, creates the release,
# and auto-bumps the Homebrew tap on release.published.
git tag v0.X.0 && git push --tags
# 3. Ship deadzone.db + deadzone.db.sha256 to the same release.
just dbrelease v0.X.0
# 4. Commit artifacts/manifest.yaml so the release-history trace lands in git.
git add artifacts/manifest.yaml && git commit -m "release v0.X.0" && git push
A stable-tag push fans out fully through CI: release.yml -> chain-release.yml dispatches scrape-pack.yml -> chain-image.yml dispatches docker-publish.yml (one workflow per concern, chained via workflow_run).
Manual Homebrew fallback. The tap auto-bump fires on release.published (#148). If RELEASE_PUBLISH_TOKEN expires and the chain breaks, run it by hand:
gh workflow run update-package-channels.yml -f tag=v0.X.0
Logs
Structured JSON on stderr via log/slog. Stdout is reserved for MCP JSON-RPC on deadzone server.
| Subcommand | Key events |
|---|---|
scrape |
scraper.start, scraper.lib_start, scraper.fetch (per URL), scraper.indexed, scraper.lib_done, scraper.done. Errors: scraper.fetch_failed, scraper.insert_failed. Agent path adds scraper.agent_configured, scraper.agent_ping_ok, scraper.agent_verification_failed, agent.input_truncated. |
consolidate |
consolidate.start, consolidate.done with artifacts, docs_merged, libs_merged, duration_ms. |
dbrelease |
dbrelease.start, packs.dbrelease.uploaded (per asset), dbrelease.done with sha256, size, lib_count, doc_count. |
server |
server.start (embedder + doc_count), one search_docs per call (lib_id, tokens, results, latency_ms). Boot may emit server.db_upgraded, server.db_version_dev_fallback WARN, server.db_tag_sidecar_write_failed WARN. |
--verbose on any subcommand adds debug-level detail. On server it logs the raw query (off by default — queries may carry user data). On scrape it adds per-doc scraper.doc_indexed.
MCP client log paths: Claude Code on macOS writes to ~/Library/Logs/Claude/mcp-server-deadzone.log; other clients vary.
Roadmap & contributing
Issues: laradji/deadzone/issues. Scope via milestones. Category via feature / research labels; priority via P1 / P2 / P3.
New library or refresh: use the New issue page and pick the matching form.
Why bother with vectors
Because "how to register a tool" should find the doc that says AddTool, and no FTS5 query will get you there without the human already knowing the answer. Embeddings-first retrieval is the point; everything else is plumbing.
Long-form: docs/research/context7-analysis.md.
License
Apache License, Version 2.0. Third-party attributions in NOTICE.
One important asterisk. Apache 2.0 covers the Deadzone source code, and only that. It does not cover the third-party documentation the scraper indexes — those docs belong to their original authors under their own licenses. Running deadzone scrape is subject to each source's ToS. A pre-built pack is bound by the original content's license, not Apache 2.0. Personal local indexing: fine. Public redistribution: do the homework first.