verify-action-mcp
When your AI agent says "I deleted user 12345" but the row count didn't change — this catches it. A small third-party verification service for AI agent tool-call evidence: submit (claim, evidence), get back a verdict and an HMAC-attested receipt.
🇯🇵 日本語版は ↓ ページ後半 を参照してください。
English
Why
AI agents commonly assert success when reality didn't match:
- "I deleted user 12345" — but the row count didn't actually change.
- "I added a null check" — but the diff also rewrote 5 unrelated functions.
- "I sent the welcome email to [email protected]" — but the request body actually targeted [email protected].
These silent successes don't show up in benchmarks (which score "did the model say it succeeded?"). They surface when something downstream breaks — sometimes hours or days later.
verify-action-mcp is a small third-party that catches that drift before downstream tools commit to it. It's a post-action evidence verifier — the receipt proves what was checked, not what is true. Existing pre-action policy admission control products from major vendors operate on a different lane; this one runs after the agent has done the work, with the artifacts.
Quick start
MCP (Claude Code, Cursor, Cline, Codex, etc.)
// claude_desktop_config.json or your harness's MCP config
{
"mcpServers": {
"verify-action": {
"transport": {"type": "http", "url": "https://verify.armadalab.dev/mcp"}
}
}
}
The agent now has a verify_action tool available. It can self-call before reporting completion, or you can invoke it from your harness logic.
REST
curl -X POST https://verify.armadalab.dev/verify \
-H 'Content-Type: application/json' \
-d '{
"claim": "Deleted user 12345",
"evidence": {
"before_count": 100,
"after_count": 99,
"operation": "DELETE FROM users WHERE id=12345",
"affected_rows": 1
}
}'
Response (receipt truncated; full shape below):
{
"verdict": "ok",
"aar_verdict": "verified",
"reasoning": "Row count decreased by exactly 1; SQL operation matches DELETE semantics; user id matches claim.",
"confidence": 0.92,
"verifier_used": "db_op_v1",
"kind_dispatched": "db_op",
"receipt": {
"schema": "verify_action_receipt.v0",
"verdict": "verified",
"claim_hash": "sha256:<64-hex>",
"evidence_manifest_hash": "sha256:<64-hex>",
"kid": "v0-default",
"issued_by": "aar:reference-impl@v0",
"signature": "hmac-sha256:<base64>",
"_full": "(see Receipts section)"
}
}
Self-host
git clone https://github.com/Armada735/verify-action-mcp
cd verify-action-mcp
./start.sh # binds 127.0.0.1:8092
./stop.sh
Pure Python stdlib. No pip install. Tested on Linux.
What it verifies
A dispatcher routes by kind (or auto-infers from evidence shape):
| Kind | Evidence shape | Critical signal that forces mismatch |
|---|---|---|
code_diff |
{diff: "<unified diff>"} |
All claimed paths absent from diff |
db_op |
{before_count, after_count, operation, affected_rows} |
Claim ID not in SQL ID |
file_op |
{path, exists_before, exists_after, line_count?, size_bytes?} |
Numeric divergence > 50% or > 50 absolute |
api_call |
{request, response_status, response_body} |
Email target mismatch (claim ↔ request body) |
generic |
any object | (conservative; usually returns insufficient_evidence) |
Each verifier looks at:
- Verb in claim ↔ direction of state change (delete = -1, insert = +1, update = 0)
- Specific identifiers / paths / emails / URLs
- Counts / line counts / sizes
- HTTP status semantics
- "Critical signals" that force
mismatchregardless of pos/neg balance
Verdicts (dual format)
| Field | Values | Notes |
|---|---|---|
aar_verdict |
verified / contradicted / insufficient_evidence / unsafe_to_verify |
4-value canonical (verify_action_receipt.v0) |
verdict |
ok / mismatch / uncertain |
3-value legacy alias for backwards compatibility |
unsafe_to_verify is returned when the verifier itself raised an exception (cannot examine evidence) — distinct from insufficient_evidence (evidence examined, ambiguous).
Receipts (verify_action_receipt.v0)
Every /verify call also issues an HMAC-SHA256-attested receipt as a nested receipt field. Full shape:
| Field | Type | Description |
|---|---|---|
schema |
string | "verify_action_receipt.v0" |
kid |
string | Key id; v0 ships with "v0-default". Operators rotate keys with fresh kids. |
issued_by |
string | Issuer identifier (this reference impl: "aar:reference-impl@v0") |
issued_at |
string | RFC 3339 UTC timestamp |
verifier_id |
string | "verify-action-mcp@<version>" |
verifier_method |
string | "rule_based.<kind>" (e.g. rule_based.db_op) |
claim_hash |
string | "sha256:<64-hex>" — content-addressed; raw claim is not stored |
evidence_manifest_hash |
string | "sha256:<64-hex>" — same |
verdict |
string | One of the 4 aar_verdict values |
confidence |
number | 0–1 |
reason_codes |
array of strings | Free-form diagnostic codes (v0 unrestricted) |
policy_or_oracle_refs |
array of strings | Optional refs to policy / oracle inputs (usually []) |
caller_context |
object | Optional caller_context echoed back (max 8 keys, 64-char strings) |
signature |
string | "hmac-sha256:<base64-no-padding>" |
What the receipt asserts: that this specific service issued this specific verdict for this content-addressed (claim, evidence) pair at this time, signed under a known key id (kid).
What the receipt does NOT assert: factual truth of the claim, legal admissibility in any forum, or warranty of any kind.
Trust model in v0: HMAC is symmetric — the receipt verifies that a private key under our control signed it. It is not a third-party attestation in the cryptographic sense. Treat v0 receipts as a content-addressed log entry from this service. Schema upgrade path for v1 (asymmetric ed25519, multi-issuer) is documented in aar/SCHEMA_UPGRADES.md.
API
| Method | Path | Purpose |
|---|---|---|
GET |
/ /about |
Project description (HTML) |
GET |
/healthcheck |
Liveness probe |
GET |
/spec |
Tool schema + verifier kinds (JSON) |
GET |
/stats |
Aggregate counters since process start |
GET |
/privacy |
Privacy notice (HTML) |
GET |
/tos |
Terms of service (HTML) |
POST |
/verify |
REST: {claim, evidence, kind?, context?, caller_context?} → verdict + receipt |
POST |
/mcp |
MCP JSON-RPC 2.0 endpoint |
MCP methods
initialize→{protocolVersion: "2024-11-05", capabilities: {tools: {}}, serverInfo: {name, version}}tools/list→{tools: [{name: "verify_action", description, inputSchema}]}tools/call(name=verify_action) →{content: [...], isError, _structured_result: {verdict, aar_verdict, reasoning, confidence, receipt, ...}}notifications/initialized,ping→ empty result
Examples
code_diff — coherent
curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
"claim": "Added null check for user.email in src/user.py",
"evidence": {
"diff": "--- a/src/user.py\n+++ b/src/user.py\n@@ -10,3 +10,5 @@\n def get_email(user):\n+ if user.email is None:\n+ return None\n return user.email"
}
}'
# → aar_verdict: verified (legacy: ok), confidence ~0.9
file_op — line count mismatch
curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
"claim": "Created /tmp/output.txt with 200 lines",
"evidence": {"path":"/tmp/output.txt","exists_before":false,"exists_after":true,"line_count":50}
}'
# → aar_verdict: contradicted (legacy: mismatch) — claim said 200 lines, evidence says 50
api_call — target email mismatch (critical signal)
curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
"claim": "Sent welcome email to [email protected]",
"evidence": {
"request": {"to":"[email protected]","subject":"Welcome!"},
"response_status": 200, "response_body": "{\"sent\":true}"
}
}'
# → aar_verdict: contradicted — target email differs from claim
Privacy
- IP addresses are SHA-256-hashed with a salt (rotates per server install). Plaintext IPs are never persisted.
- Submitted claims and evidence are written to private trace logs marked
untrusted_payload. Aggregate findings may be published; individual traces stay private. - 30-day log retention is enforced by the included
purge_old_logs.shscript (operator installs as a daily cron — seemonitor/CRON.mdfor the entry). - A PII guard rejects payloads containing JP My Number-shape (12-digit) sequences, passport-shape strings, or credit-card-shape digits (with Luhn check). Detection is structural — the guard does NOT confirm any number is a real personal identifier.
traces/ischmod 600.
See /privacy and /tos for the user-facing notice.
Phase 1 limitations
- Rule-based only — no LLM-as-judge. The 4 specialized verifiers handle their kinds well; the
genericaxis is conservative (often returnsinsufficient_evidence). - No sub-claim decomposition — 1 claim → 1 verifier.
- No cross-trace correlation — each call is independent.
- HMAC-attested receipts only — symmetric, single-issuer. Asymmetric / multi-issuer path documented in
aar/SCHEMA_UPGRADES.md. - No SLA, no rate-limit guarantee, no uptime promise on the hosted endpoint. Self-host (above) for stability.
Who this is for / not for
For:
- Agent harness developers wanting a quick post-action sanity check
- Multi-agent pipeline operators wanting an integrity boundary between steps
- Anyone evaluating "did this agent do what it said it did?" patterns
Not for:
- Security-critical attestation (HMAC v0 is not third-party-strong; wait for v1 ed25519)
- High-throughput production with strict SLA (run self-hosted, expect to maintain it)
- Domain-specific reasoning the rule-based verifiers don't cover (extend by writing a custom verifier kind under
verifiers/)
Roadmap
- Schema v1: ed25519 + multi-issuer (
aar/SCHEMA_UPGRADES.md) - LLM-augmented
genericverifier (opt-in) - Sub-claim decomposition for multi-step actions
- Cumulative observation API ("this harness mismatches on file_op X% of the time")
- Custom verifier registration
This is a 90-day probe. If meaningful adoption appears, v1 schema work begins.
License
MIT — see LICENSE.
Contact
Maintained by Armada (@Ardev_lab).Issues / questions: GitHub Issues, or [email protected].
日本語
これは何
AI エージェントが「user 12345 を削除しました」と言うのに DB の行数が変わってない — そういう silent な不整合を捉える、小さい第三者検証 service です。
エージェントから (claim, evidence) を受け取って、整合判定 (verdict) と HMAC 署名付き受領証 (verify_action_receipt.v0) を返します。
想定する失敗パターン(一般論として)
- 「user 12345 を削除しました」と言うが、DB の行数は変わってない
- 「null チェックを追加した」と言うが、diff には無関係な 5 関数の rewrite が混ざってる
- 「[email protected] に welcome メールを送った」と言うが、実際の request body は [email protected] 宛
ベンチマークは「モデルが成功と言ったか」を見ますが、「実際の状態が claim と整合的に更新されたか」は別軸の問題です。後者は agent 運用上の重要な観点の一つです。
verify-action-mcp は、その差分を downstream のツールが confirm する前に 捉える層を担います。既存の pre-action 許可制御(policy admission control / ツール呼び出し前の許可)とは独立した、post-action 証拠検証 という別レイヤを提供します。
業界標準を主張せず、reference implementation として位置づけます。receipt schema (verify_action_receipt.v0) は fork できる程度に小さく設計しています。
使い方
MCP(Claude Code / Cursor / Cline / Codex 等)
{
"mcpServers": {
"verify-action": {
"transport": {"type": "http", "url": "https://verify.armadalab.dev/mcp"}
}
}
}
これでエージェントの tools 一覧に verify_action が現れます。エージェントが完了報告の直前に self-call するパターンを想定しています。
REST
curl -X POST https://verify.armadalab.dev/verify -H 'Content-Type: application/json' -d '{
"claim": "user 12345 を削除しました",
"evidence": {
"before_count": 100, "after_count": 99,
"operation": "DELETE FROM users WHERE id=12345",
"affected_rows": 1
}
}'
応答(抜粋。receipt の完全形は下の Receipt 節参照):
{
"verdict": "ok",
"aar_verdict": "verified",
"reasoning": "Row count decreased by exactly 1; SQL operation matches DELETE semantics; user id matches claim.",
"confidence": 0.92,
"receipt": { "schema": "verify_action_receipt.v0", "...": "..." }
}
4 値判定 (aar_verdict)
| 値 | 意味 |
|---|---|
verified |
claim と evidence が整合 |
contradicted |
claim と evidence に決定的な不一致あり |
insufficient_evidence |
evidence は examined されたが判定材料が足りない |
unsafe_to_verify |
verifier が例外で evidence を examine できなかった |
旧 3 値 (ok / mismatch / uncertain) も verdict フィールドで返るため、既存 client の互換性は維持されます。
Receipt(HMAC 署名付き受領証)
/verify の応答には署名された verify_action_receipt.v0 受領証が receipt ネスト下で返ります。主な field:
| field | 内容 |
|---|---|
schema |
"verify_action_receipt.v0" |
kid |
鍵 id(v0 default は "v0-default"、operator は rotation 時に新しい kid を発行) |
issued_by |
発行者識別子(reference impl は "aar:reference-impl@v0") |
issued_at |
RFC 3339 UTC タイムスタンプ |
verifier_id |
"verify-action-mcp@<version>" |
verifier_method |
"rule_based.<kind>"(例: rule_based.db_op) |
claim_hash |
"sha256:<64-hex>" — claim 本文は保存しない |
evidence_manifest_hash |
"sha256:<64-hex>" — evidence 本文は保存しない |
verdict |
4 値のいずれか |
confidence |
0..1 |
reason_codes |
自由形式の診断コード配列 |
signature |
"hmac-sha256:<base64>" |
receipt の意味: 「このインスタンスが、この時刻に、この (claim, evidence) ペア(hash 参照)に対して、この verdict を発行した」だけです。claim 自体の真実性、いかなる法的手続における証拠能力(admissibility)、品質保証も主張するものではありません。
v0 の trust model: HMAC は対称鍵のため、receipt は「当 service が(既知の private 鍵で)署名した」ことしか証明しません。第三者証明としての強度は v1(ed25519 + multi-issuer)以降で達成予定です。schema 拡張 path は aar/SCHEMA_UPGRADES.md を参照。
Privacy
- IP は SHA-256 + salt で 16 文字に hash 化(生 IP は保存しない)※ハッシュ化済 IP からは特定の個人を識別しません。
- claim / evidence は private trace ログに
untrusted_payloadとして記録、集計指標のみ公表します - 30 日でログ自動削除(
purge_old_logs.shを operator が daily cron として運用) - マイナンバー等の特定個人情報らしき桁数列、passport-shape 文字列、credit-card-shape の数字(Luhn check 込)を含む payload は受領証発行を停止します(検出は形式のみで、番号確定をするものではありません)。
traces/はchmod 600
詳細は /privacy /tos 参照。
現時点の制約
- stdlib only / rule-based: LLM-as-judge は不実装。
generic軸は意図的に弱め - sub-claim 分解なし: 1 claim → 1 verifier
- cross-trace correlation なし: 各 call は独立判定
- HMAC(対称鍵)のみ: 多発行体対応 / asymmetric は v1 で(
aar/SCHEMA_UPGRADES.md) - hosted endpoint に SLA / uptime / rate-limit の保証はありません: 安定性が必要なら self-host を推奨
想定読者
- agent harness 開発者で、完了報告前の sanity check を仕込みたい人
- multi-agent pipeline 運用者で、ステップ間に integrity boundary を置きたい人
- 「agent が言ったとおりに本当にやったか」を継続観察したい人
ロードマップ
90 日 probe として運用、事前に commit した kill criteria に基づいて継続 / 縮小 / 撤退を判断します。adoption が現れたら schema v1(ed25519 + multi-issuer)から着手。
ライセンス・連絡先
- License: MIT(LICENSE 参照)
- 維持者: Armada (@Ardev_lab)
- Issue / 質問: GitHub Issues または
[email protected] - ※現時点では無料で提供しています(将来の有料化についてはアナウンス予定)