Armada735

verify-action-mcp

Community Armada735
Updated

MCP server issuing HMAC-attested receipts for AI agent tool-call evidence (verify_action_receipt.v0 reference impl)

verify-action-mcp

When your AI agent says "I deleted user 12345" but the row count didn't change — this catches it. A small third-party verification service for AI agent tool-call evidence: submit (claim, evidence), get back a verdict and an HMAC-attested receipt.

License: MITTests

🇯🇵 日本語版は ↓ ページ後半 を参照してください。

English

Why

AI agents commonly assert success when reality didn't match:

  • "I deleted user 12345" — but the row count didn't actually change.
  • "I added a null check" — but the diff also rewrote 5 unrelated functions.
  • "I sent the welcome email to [email protected]" — but the request body actually targeted [email protected].

These silent successes don't show up in benchmarks (which score "did the model say it succeeded?"). They surface when something downstream breaks — sometimes hours or days later.

verify-action-mcp is a small third-party that catches that drift before downstream tools commit to it. It's a post-action evidence verifier — the receipt proves what was checked, not what is true. Existing pre-action policy admission control products from major vendors operate on a different lane; this one runs after the agent has done the work, with the artifacts.

Quick start

MCP (Claude Code, Cursor, Cline, Codex, etc.)
// claude_desktop_config.json or your harness's MCP config
{
  "mcpServers": {
    "verify-action": {
      "transport": {"type": "http", "url": "https://verify.armadalab.dev/mcp"}
    }
  }
}

The agent now has a verify_action tool available. It can self-call before reporting completion, or you can invoke it from your harness logic.

REST
curl -X POST https://verify.armadalab.dev/verify \
  -H 'Content-Type: application/json' \
  -d '{
    "claim": "Deleted user 12345",
    "evidence": {
      "before_count": 100,
      "after_count": 99,
      "operation": "DELETE FROM users WHERE id=12345",
      "affected_rows": 1
    }
  }'

Response (receipt truncated; full shape below):

{
  "verdict": "ok",
  "aar_verdict": "verified",
  "reasoning": "Row count decreased by exactly 1; SQL operation matches DELETE semantics; user id matches claim.",
  "confidence": 0.92,
  "verifier_used": "db_op_v1",
  "kind_dispatched": "db_op",
  "receipt": {
    "schema": "verify_action_receipt.v0",
    "verdict": "verified",
    "claim_hash": "sha256:<64-hex>",
    "evidence_manifest_hash": "sha256:<64-hex>",
    "kid": "v0-default",
    "issued_by": "aar:reference-impl@v0",
    "signature": "hmac-sha256:<base64>",
    "_full": "(see Receipts section)"
  }
}
Self-host
git clone https://github.com/Armada735/verify-action-mcp
cd verify-action-mcp
./start.sh   # binds 127.0.0.1:8092
./stop.sh

Pure Python stdlib. No pip install. Tested on Linux.

What it verifies

A dispatcher routes by kind (or auto-infers from evidence shape):

Kind Evidence shape Critical signal that forces mismatch
code_diff {diff: "<unified diff>"} All claimed paths absent from diff
db_op {before_count, after_count, operation, affected_rows} Claim ID not in SQL ID
file_op {path, exists_before, exists_after, line_count?, size_bytes?} Numeric divergence > 50% or > 50 absolute
api_call {request, response_status, response_body} Email target mismatch (claim ↔ request body)
generic any object (conservative; usually returns insufficient_evidence)

Each verifier looks at:

  • Verb in claim ↔ direction of state change (delete = -1, insert = +1, update = 0)
  • Specific identifiers / paths / emails / URLs
  • Counts / line counts / sizes
  • HTTP status semantics
  • "Critical signals" that force mismatch regardless of pos/neg balance
Verdicts (dual format)
Field Values Notes
aar_verdict verified / contradicted / insufficient_evidence / unsafe_to_verify 4-value canonical (verify_action_receipt.v0)
verdict ok / mismatch / uncertain 3-value legacy alias for backwards compatibility

unsafe_to_verify is returned when the verifier itself raised an exception (cannot examine evidence) — distinct from insufficient_evidence (evidence examined, ambiguous).

Receipts (verify_action_receipt.v0)

Every /verify call also issues an HMAC-SHA256-attested receipt as a nested receipt field. Full shape:

Field Type Description
schema string "verify_action_receipt.v0"
kid string Key id; v0 ships with "v0-default". Operators rotate keys with fresh kids.
issued_by string Issuer identifier (this reference impl: "aar:reference-impl@v0")
issued_at string RFC 3339 UTC timestamp
verifier_id string "verify-action-mcp@<version>"
verifier_method string "rule_based.<kind>" (e.g. rule_based.db_op)
claim_hash string "sha256:<64-hex>" — content-addressed; raw claim is not stored
evidence_manifest_hash string "sha256:<64-hex>" — same
verdict string One of the 4 aar_verdict values
confidence number 0–1
reason_codes array of strings Free-form diagnostic codes (v0 unrestricted)
policy_or_oracle_refs array of strings Optional refs to policy / oracle inputs (usually [])
caller_context object Optional caller_context echoed back (max 8 keys, 64-char strings)
signature string "hmac-sha256:<base64-no-padding>"

What the receipt asserts: that this specific service issued this specific verdict for this content-addressed (claim, evidence) pair at this time, signed under a known key id (kid).

What the receipt does NOT assert: factual truth of the claim, legal admissibility in any forum, or warranty of any kind.

Trust model in v0: HMAC is symmetric — the receipt verifies that a private key under our control signed it. It is not a third-party attestation in the cryptographic sense. Treat v0 receipts as a content-addressed log entry from this service. Schema upgrade path for v1 (asymmetric ed25519, multi-issuer) is documented in aar/SCHEMA_UPGRADES.md.

API

Method Path Purpose
GET / /about Project description (HTML)
GET /healthcheck Liveness probe
GET /spec Tool schema + verifier kinds (JSON)
GET /stats Aggregate counters since process start
GET /privacy Privacy notice (HTML)
GET /tos Terms of service (HTML)
POST /verify REST: {claim, evidence, kind?, context?, caller_context?} → verdict + receipt
POST /mcp MCP JSON-RPC 2.0 endpoint
MCP methods
  • initialize{protocolVersion: "2024-11-05", capabilities: {tools: {}}, serverInfo: {name, version}}
  • tools/list{tools: [{name: "verify_action", description, inputSchema}]}
  • tools/call (name=verify_action) → {content: [...], isError, _structured_result: {verdict, aar_verdict, reasoning, confidence, receipt, ...}}
  • notifications/initialized, ping → empty result

Examples

code_diff — coherent
curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
  "claim": "Added null check for user.email in src/user.py",
  "evidence": {
    "diff": "--- a/src/user.py\n+++ b/src/user.py\n@@ -10,3 +10,5 @@\n def get_email(user):\n+    if user.email is None:\n+        return None\n     return user.email"
  }
}'
# → aar_verdict: verified (legacy: ok), confidence ~0.9
file_op — line count mismatch
curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
  "claim": "Created /tmp/output.txt with 200 lines",
  "evidence": {"path":"/tmp/output.txt","exists_before":false,"exists_after":true,"line_count":50}
}'
# → aar_verdict: contradicted (legacy: mismatch) — claim said 200 lines, evidence says 50
api_call — target email mismatch (critical signal)
curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
  "claim": "Sent welcome email to [email protected]",
  "evidence": {
    "request": {"to":"[email protected]","subject":"Welcome!"},
    "response_status": 200, "response_body": "{\"sent\":true}"
  }
}'
# → aar_verdict: contradicted — target email differs from claim

Privacy

  • IP addresses are SHA-256-hashed with a salt (rotates per server install). Plaintext IPs are never persisted.
  • Submitted claims and evidence are written to private trace logs marked untrusted_payload. Aggregate findings may be published; individual traces stay private.
  • 30-day log retention is enforced by the included purge_old_logs.sh script (operator installs as a daily cron — see monitor/CRON.md for the entry).
  • A PII guard rejects payloads containing JP My Number-shape (12-digit) sequences, passport-shape strings, or credit-card-shape digits (with Luhn check). Detection is structural — the guard does NOT confirm any number is a real personal identifier.
  • traces/ is chmod 600.

See /privacy and /tos for the user-facing notice.

Phase 1 limitations

  • Rule-based only — no LLM-as-judge. The 4 specialized verifiers handle their kinds well; the generic axis is conservative (often returns insufficient_evidence).
  • No sub-claim decomposition — 1 claim → 1 verifier.
  • No cross-trace correlation — each call is independent.
  • HMAC-attested receipts only — symmetric, single-issuer. Asymmetric / multi-issuer path documented in aar/SCHEMA_UPGRADES.md.
  • No SLA, no rate-limit guarantee, no uptime promise on the hosted endpoint. Self-host (above) for stability.

Who this is for / not for

For:

  • Agent harness developers wanting a quick post-action sanity check
  • Multi-agent pipeline operators wanting an integrity boundary between steps
  • Anyone evaluating "did this agent do what it said it did?" patterns

Not for:

  • Security-critical attestation (HMAC v0 is not third-party-strong; wait for v1 ed25519)
  • High-throughput production with strict SLA (run self-hosted, expect to maintain it)
  • Domain-specific reasoning the rule-based verifiers don't cover (extend by writing a custom verifier kind under verifiers/)

Roadmap

  • Schema v1: ed25519 + multi-issuer (aar/SCHEMA_UPGRADES.md)
  • LLM-augmented generic verifier (opt-in)
  • Sub-claim decomposition for multi-step actions
  • Cumulative observation API ("this harness mismatches on file_op X% of the time")
  • Custom verifier registration

This is a 90-day probe. If meaningful adoption appears, v1 schema work begins.

License

MIT — see LICENSE.

Contact

Maintained by Armada (@Ardev_lab).Issues / questions: GitHub Issues, or [email protected].

日本語

これは何

AI エージェントが「user 12345 を削除しました」と言うのに DB の行数が変わってない — そういう silent な不整合を捉える、小さい第三者検証 service です。

エージェントから (claim, evidence) を受け取って、整合判定 (verdict) と HMAC 署名付き受領証 (verify_action_receipt.v0) を返します。

想定する失敗パターン(一般論として)
  • 「user 12345 を削除しました」と言うが、DB の行数は変わってない
  • 「null チェックを追加した」と言うが、diff には無関係な 5 関数の rewrite が混ざってる
  • [email protected] に welcome メールを送った」と言うが、実際の request body は [email protected]

ベンチマークは「モデルが成功と言ったか」を見ますが、「実際の状態が claim と整合的に更新されたか」は別軸の問題です。後者は agent 運用上の重要な観点の一つです。

verify-action-mcp は、その差分を downstream のツールが confirm する前に 捉える層を担います。既存の pre-action 許可制御(policy admission control / ツール呼び出し前の許可)とは独立した、post-action 証拠検証 という別レイヤを提供します。

業界標準を主張せず、reference implementation として位置づけます。receipt schema (verify_action_receipt.v0) は fork できる程度に小さく設計しています。

使い方

MCP(Claude Code / Cursor / Cline / Codex 等)
{
  "mcpServers": {
    "verify-action": {
      "transport": {"type": "http", "url": "https://verify.armadalab.dev/mcp"}
    }
  }
}

これでエージェントの tools 一覧に verify_action が現れます。エージェントが完了報告の直前に self-call するパターンを想定しています。

REST
curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
  "claim": "user 12345 を削除しました",
  "evidence": {
    "before_count": 100, "after_count": 99,
    "operation": "DELETE FROM users WHERE id=12345",
    "affected_rows": 1
  }
}'

応答(抜粋。receipt の完全形は下の Receipt 節参照):

{
  "verdict": "ok",
  "aar_verdict": "verified",
  "reasoning": "Row count decreased by exactly 1; SQL operation matches DELETE semantics; user id matches claim.",
  "confidence": 0.92,
  "receipt": { "schema": "verify_action_receipt.v0", "...": "..." }
}

4 値判定 (aar_verdict)

意味
verified claim と evidence が整合
contradicted claim と evidence に決定的な不一致あり
insufficient_evidence evidence は examined されたが判定材料が足りない
unsafe_to_verify verifier が例外で evidence を examine できなかった

旧 3 値 (ok / mismatch / uncertain) も verdict フィールドで返るため、既存 client の互換性は維持されます。

Receipt(HMAC 署名付き受領証)

/verify の応答には署名された verify_action_receipt.v0 受領証が receipt ネスト下で返ります。主な field:

field 内容
schema "verify_action_receipt.v0"
kid 鍵 id(v0 default は "v0-default"、operator は rotation 時に新しい kid を発行)
issued_by 発行者識別子(reference impl は "aar:reference-impl@v0"
issued_at RFC 3339 UTC タイムスタンプ
verifier_id "verify-action-mcp@<version>"
verifier_method "rule_based.<kind>"(例: rule_based.db_op
claim_hash "sha256:<64-hex>" — claim 本文は保存しない
evidence_manifest_hash "sha256:<64-hex>" — evidence 本文は保存しない
verdict 4 値のいずれか
confidence 0..1
reason_codes 自由形式の診断コード配列
signature "hmac-sha256:<base64>"

receipt の意味: 「このインスタンスが、この時刻に、この (claim, evidence) ペア(hash 参照)に対して、この verdict を発行した」だけです。claim 自体の真実性、いかなる法的手続における証拠能力(admissibility)、品質保証も主張するものではありません。

v0 の trust model: HMAC は対称鍵のため、receipt は「当 service が(既知の private 鍵で)署名した」ことしか証明しません。第三者証明としての強度は v1(ed25519 + multi-issuer)以降で達成予定です。schema 拡張 path は aar/SCHEMA_UPGRADES.md を参照。

Privacy

  • IP は SHA-256 + salt で 16 文字に hash 化(生 IP は保存しない)※ハッシュ化済 IP からは特定の個人を識別しません。
  • claim / evidence は private trace ログに untrusted_payload として記録、集計指標のみ公表します
  • 30 日でログ自動削除(purge_old_logs.sh を operator が daily cron として運用)
  • マイナンバー等の特定個人情報らしき桁数列、passport-shape 文字列、credit-card-shape の数字(Luhn check 込)を含む payload は受領証発行を停止します(検出は形式のみで、番号確定をするものではありません)。
  • traces/chmod 600

詳細は /privacy /tos 参照。

現時点の制約

  • stdlib only / rule-based: LLM-as-judge は不実装。generic 軸は意図的に弱め
  • sub-claim 分解なし: 1 claim → 1 verifier
  • cross-trace correlation なし: 各 call は独立判定
  • HMAC(対称鍵)のみ: 多発行体対応 / asymmetric は v1 で(aar/SCHEMA_UPGRADES.md
  • hosted endpoint に SLA / uptime / rate-limit の保証はありません: 安定性が必要なら self-host を推奨

想定読者

  • agent harness 開発者で、完了報告前の sanity check を仕込みたい人
  • multi-agent pipeline 運用者で、ステップ間に integrity boundary を置きたい人
  • 「agent が言ったとおりに本当にやったか」を継続観察したい人

ロードマップ

90 日 probe として運用、事前に commit した kill criteria に基づいて継続 / 縮小 / 撤退を判断します。adoption が現れたら schema v1(ed25519 + multi-issuer)から着手。

ライセンス・連絡先

  • License: MIT(LICENSE 参照)
  • 維持者: Armada (@Ardev_lab)
  • Issue / 質問: GitHub Issues または [email protected]
  • ※現時点では無料で提供しています(将来の有料化についてはアナウンス予定)

MCP Server · Populars

MCP Server · New