haft

True harness engineering for AI-assisted software delivery.

Your agents write code fast. Most repositories are not ready for seriousharness engineering: the target system is underspecified, the enabling systemis implicit, term maps are missing, and runtime evidence is detached from thespec. Haft makes the project harnessable before it scales execution.

What is Haft?

Haft is the engineering governor that sits between your intentions and your agents' execution. It enforces the discipline that separates "we shipped fast" from "we shipped right": frame the problem before solving it, compare options under parity, record decisions as falsifiable contracts, and know the moment assumptions go stale.

Specify → Think → Run → Govern.

Not a coding agent. Not a generic documentation tool. The handle between thetool and the hand — the part that turns raw capability into formalspecification, governed decisions, bounded commissions, and evidence-backedengineering work.

Two production surfaces, one core

MCP plugin — embedded agent surface for Claude Code and Codex to reason, draft, query, and create commissions
CLI Harness — operator/runtime surface for prepare, run, status, result, apply, requeue, and cancel

Both surfaces compile into the same Haft Core artifact graph. MCP does not own long-running runtime lifecycle, and CLI does not become a second source of meaning.

Note: The TUI (haft agent) and Desktop app exist as alpha tracks under active development. They are not part of the v7 production envelope and are not recommended for production use. The MCP plugin mode (haft serve) and haft harness CLI are the stable, proven interfaces.

Install

curl -fsSL https://raw.githubusercontent.com/m0n0x41d/haft/main/install.sh | bash

The install URL still points at the historical quint-code repository path. The installed binary is haft.

Then in your project, run init with your supported host-agent flag:

# Claude Code (default if no flag)
haft init

# Claude Code with repo-local commands
haft init --local

# Codex CLI / Codex App
haft init --codex

# Claude Code + Codex
haft init --all

v7 supported embedded host-agent surfaces:

Claude Code
Codex CLI / Codex App

Cursor, Gemini CLI, JetBrains Air, and generic MCP clients remainexperimental/legacy integration targets. Their flags may exist while theruntime and docs converge, but v7 product support is intentionally narrowed toClaude Code and Codex.

# Experimental Cursor config
haft init --cursor

# Experimental Gemini CLI config
haft init --gemini

# Experimental OpenCode config
haft init --opencode

What init does per tool

The binary is the same — only the MCP config and command/prompt installationlocations differ. Supported v7 hosts:

Tool	MCP Config	Commands / Prompts	Skill
Claude Code	`.mcp.json` (project root)	`~/.claude/commands/` or `.claude/commands/` with `--local`	`~/.claude/skills/h-reason/` or local install with `--local`
Codex CLI / Codex App	`.codex/config.toml`	`~/.codex/prompts/` or `.codex/prompts/` with `--local`	`~/.agents/skills/h-reason/`

Experimental/legacy hosts:

Tool	MCP Config	Commands / Prompts	Skill
Cursor	`.cursor/mcp.json`	`~/.cursor/commands/` or `.cursor/commands/` with `--local`	`~/.cursor/skills/h-reason/` or local install with `--local`
Gemini CLI	`~/.gemini/settings.json`	`~/.gemini/commands/` or local install with `--local`	—
OpenCode	`opencode.json` (project root)	`~/.config/opencode/commands/` or `.opencode/commands/` with `--local`	`~/.config/opencode/skills/h-reason/` or `.opencode/skills/h-reason/` with `--local`
Air	`.codex/config.toml`	project `skills/`	project `skills/h-reason/`

Important for Cursor: After init, open Cursor Settings → MCP → find haft → enable the toggle. Cursor adds MCP servers as disabled by default.

Project-scoped MCP configs are safe to commit for shared repositories: ClaudeCode .mcp.json and Codex .codex/config.toml use portable project-rootvalues instead of your machine's absolute checkout path.

Existing project? Run /h-onboard after init. The target direction is deeperthan codebase summarization: onboarding should build a parseable target-systemspec, enabling-system spec, term map, and spec coverage graph before broadharness execution.

Check the spec carriers locally with:

haft spec check
haft spec check --json

haft spec check is intentionally deterministic L0/L1/L1.5 only: it parsesfenced yaml spec-section blocks, checks required structural fields, validatesknown carrier shapes, and verifies that the term-map carrier contains parseableterm entries. It does not make L2 semantic judgments, perform LLM review, proveproduct correctness, or make L3 runtime/evidence claims.

How It Works

Seven MCP tools

Tool	What it does
`haft_note`	Micro-decisions with validation + auto-expiry
`haft_problem`	Frame problems, define comparison dimensions with roles
`haft_solution`	Explore variants with diversity check, compare with parity
`haft_decision`	Decision contract with invariants, claims, evidence, baseline lifecycle
`haft_commission`	WorkCommission create/list/claim lifecycle for execution harnesses
`haft_refresh`	Lifecycle management for all artifacts
`haft_query`	Search, status dashboard, file-to-decision lookup, FPF spec search

One command: `/h-reason`

Describe your problem. The agent frames it, generates alternatives, compares them fairly, and records the decision — all in one command. It auto-selects the right depth.

Or drive each step manually

/h-frame  → /h-char  → /h-explore → /h-compare → /h-decide
  what's      what       genuinely     fair         engineering
  broken?     matters?   different     comparison   contract
                         options

From decision to code: `haft run`

Once you have a decision, implement it:

haft run dec-20260414-001

Haft reads the decision's invariants, claims, affected files, and governing invariants from the knowledge graph — then spawns an agent (Codex or Claude) with full reasoning context. After execution, takes a baseline snapshot automatically.

/h-reason "redesign the caching layer"
  ↓ frame → explore → compare → decide
  ↓
haft run dec-20260414-001 --agent codex
  ↓ reads decision → builds prompt → spawns agent
  ↓ agent implements with invariants as guardrails
  ↓ baseline snapshot on completion
  ↓
haft check
  ↓ verify governance health

WorkCommission harness

Open-Sleigh uses WorkCommission as the bounded execution authority between aDecisionRecord and a runtime run. In plugin mode, use /h-commission to createthe missing WorkCommissions without starting execution. In Codex this installsas an explicit-only $h-commission skill.

The packaged CLI path is:

haft harness run --prepare-only  # create/reuse commissions, do not start runtime
haft harness run                 # create/reuse commissions and start Open-Sleigh
haft harness status              # inspect active/recent runs
haft harness result wc-...       # inspect one completed run and workspace diff
haft harness apply wc-...        # apply a completed workspace patch to this checkout

Broad harness execution is blocked for needs_onboard projects by default. Ifyou intentionally need tactical out-of-spec work, pass--force-skip-specs "..."; Haft records that reason on the selectedWorkCommissions.

Starting Open-Sleigh is an operator/runtime action. Use the CLI Harness forthat boundary rather than a plugin slash command. MCP may create or inspectWorkCommissions, but it must not own the long-running runtime lifecycle.

Commissions carry a delivery_policy. The default isworkspace_patch_manual: Open-Sleigh changes stay in the isolated workspaceuntil an operator runs haft harness apply <commission-id>. Use--delivery-policy workspace_patch_manual explicitly when preparing or runningplans that must not mutate the main checkout automatically.

Release installs include the Open-Sleigh runtime under:

~/.haft/runtimes/open-sleigh/current

haft harness run uses a repo-local open-sleigh/ checkout when present andfalls back to that installed runtime. Release archives bundle the BEAM runtime,so users do not need to install Elixir/Mix for normal harness use. Sourcefallback installs may install Elixir when they need to build the runtimelocally.

The lower-level MCP tool is haft_commission; the local CLI helper is:

haft commission create-from-decision dec-...
haft commission create-batch dec-a dec-b
haft commission create-from-plan .haft/plans/implementation.yaml
haft commission create --json commission.json
haft commission list --selector stale
haft commission show wc-...
haft commission requeue wc-... --reason stale_operator_recovery
haft commission cancel wc-... --reason no_longer_relevant
haft commission complete-external wc-... --runner external-runner --reason external_runtime_succeeded --payload-file runtime-evidence.json
haft commission list-runnable
haft commission claim wc-...

For the repeatable local E2E smoke:

task open-sleigh:smoke-real-haft

The same loop is what alpha Desktop workflow buttons compile to. A button mustbecome a typed artifact transition, not a free prompt:

SpecSection(s) -> DecisionRecord -> WorkCommission -> RuntimeRun -> Evidence -> SpecCoverage

Evidence workflow

Attach evidence to decisions with haft_decision(action="evidence", ...). Evidence has formality levels (F0-F3), congruence levels (CL0-CL3), and expiry dates. Trust scores (R_eff) degrade as evidence ages. Stale evidence triggers refresh.

Use haft_decision(action="measure", ...) for post-implementation verification.

What Makes It Different

Decisions are live — computed trust scores (R_eff) degrade as evidence ages
Comparison is honest — parity enforced, constraint-aware Pareto elimination, anti-Goodhart observation indicators
Invariants linked to code — knowledge graph maps decisions to modules via dependency graph
Memory across sessions — related past decisions surface during framing, similar variants during exploration
The loop closes — failed measurements reopen decisions, evidence decay triggers review, drift detection flags violations
Decisions are contracts — invariants, claims with thresholds, rollback plan, valid-until date

Batch Harness (Beta — v7.x)

Status: Single-commission haft harness run is the trustworthy operator path. Drain mode (--drain --concurrency N) and workspace_patch_auto_on_pass auto-apply landed in v7.x and have been validated end-to-end on docs-class commissions; treat them as Beta on production-code commissions.

The harness lets you commission work from DecisionRecord artifacts and run it under a real codex agent in an isolated workspace, with scope guards (allowed_paths/forbidden_paths), per-commission lockset enforcement, and discrete revertable apply commits per commission.

# 1. Decision exists (via /h-decide or haft_decision MCP). Create commissions:
haft harness run --prepare-only         # auto-selects active decisions
haft commission create-from-decision dec-... --delivery-policy workspace_patch_auto_on_pass

# 2. Drain the queue overnight (manual apply by default; auto on pass when policy allows):
haft harness run --drain --concurrency 4

# 3. Outcome:
#    workspace_patch_manual    -> diff held in workspace clone, run `haft harness apply <id>`
#    workspace_patch_auto_on_pass + verdict=pass -> auto-applied as a discrete commit
#    blocked_policy / failed   -> stays for operator decision (requeue / cancel)

AutonomyEnvelope continues to gate creation, preflight, and execute; the apply step is purely policy + verdict per the V3 invariant. Stale claims older than the configurable cap (default 24h) are skipped at intake with a typed lease_too_old reason.

Detailed guide, real-world flows, and known rough edges: see docs/7.x/harness-batch.

Desktop App (pre-alpha)

Warning: The desktop app is in pre-alpha. Use at your own risk.

Built with Tauri v2 (Rust shell + React frontend). Launch with:

haft desktop        # finds Haft.app or falls back to dev build

Build from source (requires Rust toolchain + bun/npm for the frontend):

./scripts/build.sh --install   # builds Go binary + TUI bundle, installs locally
cd desktop-tauri && cargo tauri build   # builds the desktop app bundle

Features: dashboard with governance findings, problem board, decision detail with evidence decomposition, portfolio comparison with Pareto front, task spawning, agent chat view, terminal panel, multi-project management, search (Cmd+K).

Built on First Principles Framework

FPF by Anatoly Levenchuk — a rigorous, transdisciplinary architecture for thinking.

/h-reason gives your AI agent an FPF-native operating system for engineering decisions: problem framing before solutions, characterization before comparison, parity enforcement, evidence with congruence penalties, weakest-link assurance, and the lemniscate cycle that closes itself when evidence ages or measurements fail.

haft fpf search gives access to the indexed FPF specification with tiered retrieval: exact pattern id → route-aware concept matching → keyword fallback.

Roadmap

v6.1 — Harden the Contract (shipped)

Decision quality enforcement before automating execution:

haft check for local governance verification (exit 0 = clean, exit 1 = findings)
/h-verify surfaces full governance state (problems, invariants, drift)
.haft/workflow.md — repo-level agent policy, injected into every prompt
Problem typing (optimization / diagnosis / search / synthesis)
G1 enforced (one decision per problem), G2/G4 warnings (parity plan, subjective dimensions)
Claim-scoped R_eff, evidence supersession, CL0 rejection
Deep /h-onboard with module-by-module analysis for legacy projects

v6.2 — Dashboard + Execution + Design System (shipped 2026-04-20)

The desktop became a real operator surface, the reasoning vocabulary grew semiotic teeth, and the two transport layers stopped drifting from each other:

Unified Dashboard — decisions, governance findings, recent activity in one view
Implement — click a decision, agent spawns in worktree with full reasoning context, baseline taken on success, PR body generated from decision rationale
Adopt — governance finding → agent thread for interactive resolution; agent never auto-resolves
haft run — same Implement pipeline from CLI, with planning + per-task verification + final invariant review
Tauri v2 desktop migration (from Wails v2)
Haft Design System — typed React primitives (Eyebrow, Button, Badge, Card, Input, StatCard, MonoId, Pill) + ComparisonTable with border-first Pareto grid + DecayWindow progress bar on decision detail
Seven new FPF semiotic patterns (FRAME-08 / FRAME-09 / CHR-10 / CHR-11 / CHR-12 / X-STATEMENT-TYPE / X-FANOUT-AUDIT) sourced from Levenchuk's seminar, auto-injected into reasoning tool responses
governance_mode on DecisionRecord — file-level vs module-level governance, opt-in, honors FPF X-SCOPE
Random-hex artifact IDs (dec-20260420-a3f7c1d2) with optional DecisionRecord task slugs (dec-20260420-task-4-a3f7c1d2) to prevent merge conflicts while keeping filenames navigable (#63, #66)
MCP parity_plan exposure for deep-mode comparison (#62)
Transport-parity drift detection + layered architecture boundary tests
internal/embedding extraction; internal/fpf is now pure Core
Valid-until self-application on FPF pattern files with a failing test when content ages past six months

v7 — Project Harnessability MVP

One proved cycle:

Add project → Init → Target spec → Enabling spec → Spec check → Decision → WorkCommission → RuntimeRun → Evidence → SpecCoverage.

The goal is not a better task runner. The goal is to turn an arbitrary repointo a harnessable engineering system.

v8 — Governor Signals

Background detection loops (stale, drift, dependencies) with dashboard alerts. Autonomous actuation after trust is earned.

Cross-repo harness — research track (post-v7.0)

Multi-project workflows where each registered haft project becomes anaddressable endpoint that other projects' agents can ask typed questions of("what does your enabling spec say about commission policy?", "what's thestatus of decision X?") and request typed changes from(MessageEnvelope → ProblemCard → DecisionRecord → WorkCommission in thetarget project, with the target's normal decide-then-commission disciplineintact). Built on what v7 already ships: spec carriers as the addressable"what this repo is about" surface, AutonomyEnvelope's allowed_reposwhitelist as the structural cross-repo gate, Open-Sleigh worktree clones atbase_sha as the read-only-by-default response runner, and the existingEvidence model as the "no claim without carrier" response contract. Thisis post-v7.0 work: target audience is multi-repo developers who currentlycopy context between repos by hand; sequencing starts with cross-projectREAD through haft_query(project_ref=..., ...) (small, additive, no newartifact kinds) and only moves to typed change requests after the readsurface earns evidence on real multi-repo work. Detailed mapping of howthis composes with v7 primitives lives in.context/cross-repo-mail-haft-mapping.md. Not committed to a releaseversion yet.

Desktop App — alpha track (no committed version)

The Tauri v2 desktop app exists as an alpha exploration of a human surfaceover the same Haft Core. It is not part of the v7 production envelope: notextensively tested, not the canonical operator surface, and not load-bearingfor any v7 acceptance criteria. The canonical surfaces for v7 are MCP plugin(haft serve) and CLI Harness (haft harness ...); the desktop app willgraduate out of alpha when (a) its onboarding, harness, and verify panels eachhave end-to-end dogfood evidence on the level of CLI/MCP, and (b)desktop-specific contracts (Tauri IPC argument shape, transcript filtering,readiness projection) carry their own regression suites at the same depth asthe Go/MCP surfaces. Until then it ships as a "preview from source" buildbehind the alpha warning. No release version is committed to it on the v7timeline.

Requirements

Go 1.25+ (for building from source)
Claude Code or Codex for supported v7 plugin mode
Rust toolchain + Tauri v2 (only when building the desktop app from source)

License

MIT

What is Haft?

Two production surfaces, one core

Install

What init does per tool

How It Works

Seven MCP tools

One command: `/h-reason`

Or drive each step manually

From decision to code: `haft run`

WorkCommission harness

Evidence workflow

What Makes It Different

Batch Harness (Beta — v7.x)

Desktop App (pre-alpha)

Built on First Principles Framework

Roadmap

v6.1 — Harden the Contract (shipped)

v6.2 — Dashboard + Execution + Design System (shipped 2026-04-20)

v7 — Project Harnessability MVP

v8 — Governor Signals

Cross-repo harness — research track (post-v7.0)

Desktop App — alpha track (no committed version)

Requirements

License

MCP Server · Populars

🦞 OpenClaw — Personal AI Assistant

MarkItDown-MCP

MarkItDown

Awesome MCP Servers

mcp-server-sentry: A Sentry MCP server

MCP Server · New

Tauri Plugin: Model Context Protocol (MCP)

haft

Office Document Processing MCP Server

PyGhidra-MCP - Ghidra Model Context Protocol Server

DVMCP — Damn Vulnerable MCP

What is Haft?

Two production surfaces, one core

Install

What init does per tool

How It Works

Seven MCP tools

One command: /h-reason

Or drive each step manually

From decision to code: haft run

WorkCommission harness

Evidence workflow

What Makes It Different

Batch Harness (Beta — v7.x)

Desktop App (pre-alpha)

Built on First Principles Framework

Roadmap

v6.1 — Harden the Contract (shipped)

v6.2 — Dashboard + Execution + Design System (shipped 2026-04-20)

v7 — Project Harnessability MVP

v8 — Governor Signals

Cross-repo harness — research track (post-v7.0)

Desktop App — alpha track (no committed version)

Requirements

License

MCP Server · Populars

MCP Server · New

One command: `/h-reason`

From decision to code: `haft run`