jarmstrong158

clark-mcp

Community jarmstrong158
Updated

Local, offline natural-language interface to the Clark warehouse RL agent — MCP server + fully-local hermes3:8b client. No cloud, no API cost.

clark-mcp

Local, offline natural-language interface to Clark (the warehouseworkforce RL agent). Plain English in → real staffing decisions out.No cloud, no API cost, no data egress.

Ask "what's the opening plan for the east dock Tuesday, and whathappens if two pickers call off?" — a local LLM turns that into realClark tool calls and explains the result honestly. Nothing leaves themachine.

What it is

you ──text──▶ hermes3:8b (Ollama, local)
                   │  tool calls (MCP, stdio)
                   ▼
              clark-mcp server  ──HTTP──▶  clark serve  (localhost inference API)
                                                 │
                                                 ▼  real Clark RL inference

Three thin layers, each independently testable, no shared process:

  • clark serve (in the clarkrepo) — a minimal localhost inference API: 5 stateless read routes,weights loaded once. Not part of this repo; this repo consumes it.
  • clark_mcp/server.py — a real MCP server (any MCP host can useit) exposing 5 tools: clark_list_facilities, clark_facility_info,clark_get_plan, clark_what_if, clark_explain_decision.
  • clark_mcp/agent.py — a fully-local client: a Hermes-3-8B modelin Ollama drives those tools and explains the result. Zero externalcalls.

Nothing here re-implements inference — every tool delegates to Clark'slocalhost API over HTTP. clark_explain_decision returns Clark's planplus the facility's rules as grounding; the explanation is themodel's, not Clark policy introspection (Clark is an RL policy — itemits actions, not reasons).

See docs/ARCHITECTURE.md for the full design,tool contracts, the honesty model, and the fine-tune pipeline.

The honesty model (why this is built the way it is)

This system is designed to be a truthful staffing tool, not aconfident one. Three rules are enforced structurally, not by promptalone:

  1. Never invent a plan. Plans/what-ifs always come from a live toolcall; a tool error or unknown facility is reported plainly, neverpapered over with a plausible-looking fabrication.
  2. Explain, don't introspect. Clark is an RL policy. The systemdescribes what it assigned and interprets it against the facility'srules — it never claims to know why the network chose it.
  3. Opening assignment ≠ outcome. A plan is the start-of-dayassignment, not a simulated end-of-day grade. What-ifs compareopening assignments across scenarios, not projected results.

A trained Clark genuinely fails some days (a roster can be too thin forits volume). The tool is meant to surface those failures honestly —not to be tuned until it always "wins."

Run (all local)

# 1. Clark inference API (from the clark repo, on a stable checkpoint)
clark serve --model <checkpoint.pt> --facilities-dir clark/data/configs --port 8000

# 2. Ollama with the model pulled
ollama pull hermes3:8b

# 3. this:
pip install -e .
python -m clark_mcp.agent          # interactive
# or run the MCP server for any MCP host:
clark-mcp

Tests

pip install -e ".[dev]"
pytest                              # tool layer, against a fake clark client

The pure tool layer (tools.py) is unit-tested with an injected fakeclient. The agent.py LLM loop and MCP stdio transport aresmoke-tested, not regression-covered — exercised once manually(hermes3:8b → MCP → clark-mcp → Clark), not in CI.

Status

Phase What State
0 Minimal localhost Clark inference API (clark serve) Built + hardened. Non-facility configs → clean 422 (not 500); seeded /plan reproducible; pytest-green against real Clark.
1 MCP server + fully-local Hermes-3 client Built. Tool layer regression-covered; LLM loop + stdio smoke-tested.
2 Fine-tune dataset Built + quality-gated. Real-API generator + 8-example gold bar; 160 curated examples, every taught behavior covered, no category > 30%. See finetune/DATASET.md.
3 QLoRA domain fine-tune of the local model Next — needs an eval harness first (held-out split + measurable scoring) before any train run.
4 Integration + the staffing-sufficiency what-if Planned. The headline capability: sweep the trained policy across roster sizes and report the failure-vs-staffing curve ("+2 staff turns ~21% F-days into ~8%"). Requires a new what_if roster-size parameter — it currently only removes workers.
5 Portfolio write-up Planned.

Honest scope of the result: the pipeline works end to end; planquality is only as good as the Clark checkpoint behind it, and answerfluency is base-Hermes until the Phase 3 fine-tune lands.

Project memory

Architectural decisions and constraints are recorded under.context/ (context-keeper): why the runtime isHTTP-decoupled from Clark, why every fine-tune payload must belive-captured, why the dataset is quality-gated against the gold set.Read these before changing the contract.

MCP Server · Populars

MCP Server · New

    heymrun

    Heym

    Self-hosted AI workflow automation platform with visual canvas, agents, RAG, HITL, MCP, and observability in one runtime.

    Community heymrun
    Wide-Moat

    Open Computer Use

    MCP server that gives any LLM its own computer — managed Docker workspaces with live browser, terminal, code execution, document skills, and autonomous sub-agents. Self-hosted, open-source, pluggable into any model.

    Community Wide-Moat
    uarlouski

    🚀 TestRail MCP Server

    AI-native MCP server connecting Claude, Cursor, Windsurf, and other AI assistants to TestRail — manage test cases, runs, and results through natural-language conversation, with typed schemas built for LLMs.

    Community uarlouski
    metabase

    Metabase MCP Server

    The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:

    Community metabase
    kao273183

    MK QA Master

    AI 測試大師 — MCP server driving pytest / Jest / Cypress / Go / Maestro. Analyze, generate, run, advise. Web + Mobile (iOS/Android/BlueStacks).

    Community kao273183