verify-action-mcp

When your AI agent says "I deleted user 12345" but the row count didn't change — this catches it. A small third-party verification service for AI agent tool-call evidence: submit (claim, evidence), get back a verdict and an HMAC-attested receipt.

License: MIT Tests

🇯🇵 日本語版は ↓ ページ後半を参照してください。

English

Why

AI agents commonly assert success when reality didn't match:

"I deleted user 12345" — but the row count didn't actually change.
"I added a null check" — but the diff also rewrote 5 unrelated functions.
"I sent the welcome email to [email protected]" — but the request body actually targeted [email protected].

These silent successes don't show up in benchmarks (which score "did the model say it succeeded?"). They surface when something downstream breaks — sometimes hours or days later.

verify-action-mcp is a small third-party that catches that drift before downstream tools commit to it. It's a post-action evidence verifier — the receipt proves what was checked, not what is true. Existing pre-action policy admission control products from major vendors operate on a different lane; this one runs after the agent has done the work, with the artifacts.

Quick start

MCP (Claude Code, Cursor, Cline, Codex, etc.)

// claude_desktop_config.json or your harness's MCP config
{
  "mcpServers": {
    "verify-action": {
      "transport": {"type": "http", "url": "https://verify.armadalab.dev/mcp"}
    }
  }
}

The agent now has a verify_action tool available. It can self-call before reporting completion, or you can invoke it from your harness logic.

REST

curl -X POST https://verify.armadalab.dev/verify \
  -H 'Content-Type: application/json' \
  -d '{
    "claim": "Deleted user 12345",
    "evidence": {
      "before_count": 100,
      "after_count": 99,
      "operation": "DELETE FROM users WHERE id=12345",
      "affected_rows": 1
    }
  }'

Response (receipt truncated; full shape below):

{
  "verdict": "ok",
  "aar_verdict": "verified",
  "reasoning": "Row count decreased by exactly 1; SQL operation matches DELETE semantics; user id matches claim.",
  "confidence": 0.92,
  "verifier_used": "db_op_v1",
  "kind_dispatched": "db_op",
  "receipt": {
    "schema": "verify_action_receipt.v0",
    "verdict": "verified",
    "claim_hash": "sha256:<64-hex>",
    "evidence_manifest_hash": "sha256:<64-hex>",
    "kid": "v0-default",
    "issued_by": "aar:reference-impl@v0",
    "signature": "hmac-sha256:<base64>",
    "_full": "(see Receipts section)"
  }
}

Self-host

git clone https://github.com/Armada735/verify-action-mcp
cd verify-action-mcp
./start.sh   # binds 127.0.0.1:8092
./stop.sh

Pure Python stdlib. No pip install. Tested on Linux.

What it verifies

A dispatcher routes by kind (or auto-infers from evidence shape):

Kind	Evidence shape	Critical signal that forces `mismatch`
`code_diff`	`{diff: "<unified diff>"}`	All claimed paths absent from diff
`db_op`	`{before_count, after_count, operation, affected_rows}`	Claim ID not in SQL ID
`file_op`	`{path, exists_before, exists_after, line_count?, size_bytes?}`	Numeric divergence > 50% or > 50 absolute
`api_call`	`{request, response_status, response_body}`	Email target mismatch (claim ↔ request body)
`generic`	any object	(conservative; usually returns `insufficient_evidence`)

Each verifier looks at:

Verb in claim ↔ direction of state change (delete = -1, insert = +1, update = 0)
Specific identifiers / paths / emails / URLs
Counts / line counts / sizes
HTTP status semantics
"Critical signals" that force mismatch regardless of pos/neg balance

Verdicts (dual format)

Field	Values	Notes
`aar_verdict`	`verified` / `contradicted` / `insufficient_evidence` / `unsafe_to_verify`	4-value canonical (`verify_action_receipt.v0`)
`verdict`	`ok` / `mismatch` / `uncertain`	3-value legacy alias for backwards compatibility

unsafe_to_verify is returned when the verifier itself raised an exception (cannot examine evidence) — distinct from insufficient_evidence (evidence examined, ambiguous).

Receipts (`verify_action_receipt.v0`)

Every /verify call also issues an HMAC-SHA256-attested receipt as a nested receipt field. Full shape:

Field	Type	Description
`schema`	string	`"verify_action_receipt.v0"`
`kid`	string	Key id; `v0` ships with `"v0-default"`. Operators rotate keys with fresh kids.
`issued_by`	string	Issuer identifier (this reference impl: `"aar:reference-impl@v0"`)
`issued_at`	string	RFC 3339 UTC timestamp
`verifier_id`	string	`"verify-action-mcp@<version>"`
`verifier_method`	string	`"rule_based.<kind>"` (e.g. `rule_based.db_op`)
`claim_hash`	string	`"sha256:<64-hex>"` — content-addressed; raw claim is not stored
`evidence_manifest_hash`	string	`"sha256:<64-hex>"` — same
`verdict`	string	One of the 4 `aar_verdict` values
`confidence`	number	0–1
`reason_codes`	array of strings	Free-form diagnostic codes (v0 unrestricted)
`policy_or_oracle_refs`	array of strings	Optional refs to policy / oracle inputs (usually `[]`)
`caller_context`	object	Optional `caller_context` echoed back (max 8 keys, 64-char strings)
`signature`	string	`"hmac-sha256:<base64-no-padding>"`

What the receipt asserts: that this specific service issued this specific verdict for this content-addressed (claim, evidence) pair at this time, signed under a known key id (kid).

What the receipt does NOT assert: factual truth of the claim, legal admissibility in any forum, or warranty of any kind.

Trust model in v0: HMAC is symmetric — the receipt verifies that a private key under our control signed it. It is not a third-party attestation in the cryptographic sense. Treat v0 receipts as a content-addressed log entry from this service. Schema upgrade path for v1 (asymmetric ed25519, multi-issuer) is documented in aar/SCHEMA_UPGRADES.md.

API

Method	Path	Purpose
`GET`	`/` `/about`	Project description (HTML)
`GET`	`/healthcheck`	Liveness probe
`GET`	`/spec`	Tool schema + verifier kinds (JSON)
`GET`	`/stats`	Aggregate counters since process start
`GET`	`/privacy`	Privacy notice (HTML)
`GET`	`/tos`	Terms of service (HTML)
`POST`	`/verify`	REST: `{claim, evidence, kind?, context?, caller_context?}` → verdict + receipt
`POST`	`/mcp`	MCP JSON-RPC 2.0 endpoint

MCP methods

initialize → {protocolVersion: "2024-11-05", capabilities: {tools: {}}, serverInfo: {name, version}}
tools/list → {tools: [{name: "verify_action", description, inputSchema}]}
tools/call (name=verify_action) → {content: [...], isError, _structured_result: {verdict, aar_verdict, reasoning, confidence, receipt, ...}}
notifications/initialized, ping → empty result

Examples

`code_diff` — coherent

curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
  "claim": "Added null check for user.email in src/user.py",
  "evidence": {
    "diff": "--- a/src/user.py\n+++ b/src/user.py\n@@ -10,3 +10,5 @@\n def get_email(user):\n+    if user.email is None:\n+        return None\n     return user.email"
  }
}'
# → aar_verdict: verified (legacy: ok), confidence ~0.9

`file_op` — line count mismatch

curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
  "claim": "Created /tmp/output.txt with 200 lines",
  "evidence": {"path":"/tmp/output.txt","exists_before":false,"exists_after":true,"line_count":50}
}'
# → aar_verdict: contradicted (legacy: mismatch) — claim said 200 lines, evidence says 50

`api_call` — target email mismatch (critical signal)

curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
  "claim": "Sent welcome email to [email protected]",
  "evidence": {
    "request": {"to":"[email protected]","subject":"Welcome!"},
    "response_status": 200, "response_body": "{\"sent\":true}"
  }
}'
# → aar_verdict: contradicted — target email differs from claim

Privacy

IP addresses are SHA-256-hashed with a salt (rotates per server install). Plaintext IPs are never persisted.
Submitted claims and evidence are written to private trace logs marked untrusted_payload. Aggregate findings may be published; individual traces stay private.
30-day log retention is enforced by the included purge_old_logs.sh script (operator installs as a daily cron — see monitor/CRON.md for the entry).
A PII guard rejects payloads containing JP My Number-shape (12-digit) sequences, passport-shape strings, or credit-card-shape digits (with Luhn check). Detection is structural — the guard does NOT confirm any number is a real personal identifier.
traces/ is chmod 600.

See /privacy and /tos for the user-facing notice.

Phase 1 limitations

Rule-based only — no LLM-as-judge. The 4 specialized verifiers handle their kinds well; the generic axis is conservative (often returns insufficient_evidence).
No sub-claim decomposition — 1 claim → 1 verifier.
No cross-trace correlation — each call is independent.
HMAC-attested receipts only — symmetric, single-issuer. Asymmetric / multi-issuer path documented in aar/SCHEMA_UPGRADES.md.
No SLA, no rate-limit guarantee, no uptime promise on the hosted endpoint. Self-host (above) for stability.

Who this is for / not for

For:

Agent harness developers wanting a quick post-action sanity check
Multi-agent pipeline operators wanting an integrity boundary between steps
Anyone evaluating "did this agent do what it said it did?" patterns

Not for:

Security-critical attestation (HMAC v0 is not third-party-strong; wait for v1 ed25519)
High-throughput production with strict SLA (run self-hosted, expect to maintain it)
Domain-specific reasoning the rule-based verifiers don't cover (extend by writing a custom verifier kind under verifiers/)

Roadmap

Schema v1: ed25519 + multi-issuer (aar/SCHEMA_UPGRADES.md)
LLM-augmented generic verifier (opt-in)
Sub-claim decomposition for multi-step actions
Cumulative observation API ("this harness mismatches on file_op X% of the time")
Custom verifier registration

This is a 90-day probe. If meaningful adoption appears, v1 schema work begins.

License

MIT — see LICENSE.

Contact

Maintained by Armada (@Ardev_lab).Issues / questions: GitHub Issues, or [email protected].

日本語

これは何

AI エージェントが「user 12345 を削除しました」と言うのに DB の行数が変わってない — そういう silent な不整合を捉える、小さい第三者検証 service です。

エージェントから (claim, evidence) を受け取って、整合判定 (verdict) と HMAC 署名付き受領証 (verify_action_receipt.v0) を返します。

想定する失敗パターン（一般論として）

「user 12345 を削除しました」と言うが、DB の行数は変わってない
「null チェックを追加した」と言うが、diff には無関係な 5 関数の rewrite が混ざってる
「[email protected] に welcome メールを送った」と言うが、実際の request body は [email protected] 宛

ベンチマークは「モデルが成功と言ったか」を見ますが、「実際の状態が claim と整合的に更新されたか」は別軸の問題です。後者は agent 運用上の重要な観点の一つです。

verify-action-mcp は、その差分を downstream のツールが confirm する前に 捉える層を担います。既存の pre-action 許可制御（policy admission control / ツール呼び出し前の許可）とは独立した、post-action 証拠検証 という別レイヤを提供します。

業界標準を主張せず、reference implementation として位置づけます。receipt schema (verify_action_receipt.v0) は fork できる程度に小さく設計しています。

使い方

MCP（Claude Code / Cursor / Cline / Codex 等）

{
  "mcpServers": {
    "verify-action": {
      "transport": {"type": "http", "url": "https://verify.armadalab.dev/mcp"}
    }
  }
}

これでエージェントの tools 一覧に verify_action が現れます。エージェントが完了報告の直前に self-call するパターンを想定しています。

REST

curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
  "claim": "user 12345 を削除しました",
  "evidence": {
    "before_count": 100, "after_count": 99,
    "operation": "DELETE FROM users WHERE id=12345",
    "affected_rows": 1
  }
}'

応答（抜粋。receipt の完全形は下の Receipt 節参照）:

{
  "verdict": "ok",
  "aar_verdict": "verified",
  "reasoning": "Row count decreased by exactly 1; SQL operation matches DELETE semantics; user id matches claim.",
  "confidence": 0.92,
  "receipt": { "schema": "verify_action_receipt.v0", "...": "..." }
}

4 値判定 (`aar_verdict`)

値	意味
`verified`	claim と evidence が整合
`contradicted`	claim と evidence に決定的な不一致あり
`insufficient_evidence`	evidence は examined されたが判定材料が足りない
`unsafe_to_verify`	verifier が例外で evidence を examine できなかった

旧 3 値 (ok / mismatch / uncertain) も verdict フィールドで返るため、既存 client の互換性は維持されます。

Receipt（HMAC 署名付き受領証）

/verify の応答には署名された verify_action_receipt.v0 受領証が receipt ネスト下で返ります。主な field:

field	内容
`schema`	`"verify_action_receipt.v0"`
`kid`	鍵 id（v0 default は `"v0-default"`、operator は rotation 時に新しい kid を発行）
`issued_by`	発行者識別子（reference impl は `"aar:reference-impl@v0"`）
`issued_at`	RFC 3339 UTC タイムスタンプ
`verifier_id`	`"verify-action-mcp@<version>"`
`verifier_method`	`"rule_based.<kind>"`（例: `rule_based.db_op`）
`claim_hash`	`"sha256:<64-hex>"` — claim 本文は保存しない
`evidence_manifest_hash`	`"sha256:<64-hex>"` — evidence 本文は保存しない
`verdict`	4 値のいずれか
`confidence`	0..1
`reason_codes`	自由形式の診断コード配列
`signature`	`"hmac-sha256:<base64>"`

receipt の意味: 「このインスタンスが、この時刻に、この (claim, evidence) ペア（hash 参照）に対して、この verdict を発行した」だけです。claim 自体の真実性、いかなる法的手続における証拠能力（admissibility）、品質保証も主張するものではありません。

v0 の trust model: HMAC は対称鍵のため、receipt は「当 service が（既知の private 鍵で）署名した」ことしか証明しません。第三者証明としての強度は v1（ed25519 + multi-issuer）以降で達成予定です。schema 拡張 path は aar/SCHEMA_UPGRADES.md を参照。

Privacy

IP は SHA-256 + salt で 16 文字に hash 化（生 IP は保存しない）※ハッシュ化済 IP からは特定の個人を識別しません。
claim / evidence は private trace ログに untrusted_payload として記録、集計指標のみ公表します
30 日でログ自動削除（purge_old_logs.sh を operator が daily cron として運用）
マイナンバー等の特定個人情報らしき桁数列、passport-shape 文字列、credit-card-shape の数字（Luhn check 込）を含む payload は受領証発行を停止します（検出は形式のみで、番号確定をするものではありません）。
traces/ は chmod 600

詳細は /privacy /tos 参照。

現時点の制約

stdlib only / rule-based: LLM-as-judge は不実装。generic 軸は意図的に弱め
sub-claim 分解なし: 1 claim → 1 verifier
cross-trace correlation なし: 各 call は独立判定
HMAC（対称鍵）のみ: 多発行体対応 / asymmetric は v1 で（aar/SCHEMA_UPGRADES.md）
hosted endpoint に SLA / uptime / rate-limit の保証はありません: 安定性が必要なら self-host を推奨

想定読者

agent harness 開発者で、完了報告前の sanity check を仕込みたい人
multi-agent pipeline 運用者で、ステップ間に integrity boundary を置きたい人
「agent が言ったとおりに本当にやったか」を継続観察したい人

ロードマップ

90 日 probe として運用、事前に commit した kill criteria に基づいて継続 / 縮小 / 撤退を判断します。adoption が現れたら schema v1（ed25519 + multi-issuer）から着手。

ライセンス・連絡先

License: MIT（LICENSE 参照）
維持者: Armada (@Ardev_lab)
Issue / 質問: GitHub Issues または [email protected]
※現時点では無料で提供しています（将来の有料化についてはアナウンス予定）