GLM-as-Subagent for Claude Code — plug & play

📦 Canonical source: https://github.com/djerok/glm_mcp_claude — created by@djerok. If you found this via a fork, mirror, or anawesome-list, the original lives here. Please ⭐ / file issues / open PRs at the source.

Add GLM (Zhipu / Z.ai) to Claude Code as a cheap, full-capability subagent (~10× cheaperthan Opus), with automatic per-task routing between Opus and GLM. Your main agent stays onOpus; GLM does the well-specified, cost-sensitive work — and can read, write, edit, and run yourfiles directly. One command to install.

✅ Works in the Claude Code app on a subscription-based Claude. Your main agent runs on theClaude you already pay for through the Claude Code app (Pro / Max / Team subscription) —no separate pay-per-token Anthropic API key required. Only GLM needs a (cheap) Z.ai key.Opus orchestrates on your subscription; GLM does the heavy lifting for a fraction of the cost.

The glm subagent (orchestrated by Haiku 4.5, the cheap layer) reading this repo and offloading generation to GLM

_{↑ The glm subagent (orchestrated by Haiku 4.5, the cheap layer) reading the repo and offloading the heavy work to GLM via the MCP tools — the Opus → Haiku → GLM hybrid in action.}

# no clone needed — run straight from GitHub:
npx github:djerok/glm_mcp_claude --key YOUR_ZAI_API_KEY

# or clone and run the installer:
node install.mjs --key YOUR_ZAI_API_KEY

…then restart Claude Code. That's it. (Details below.)

🔑 Your key must be from the Z.ai / Zhipu GLM Coding Plan. Get one athttps://z.ai → subscribe to the GLM Coding Plan, then create an API key. A generic /free key without coding-plan access will not work for the coding models used here.

What you get

glm subagent — a full-tool subagent (read/write/edit/bash) powered by GLM.
glm_agent tool — GLM as a real file-editing agent with built-in oversight (diff, dry-run, git revert).
glm_delegate / glm_recommend / glm_status — draft-only delegation, a free routing advisor, and a health check.
Auto-delegation hook — when you spawn a subagent, it injects a GLM-vs-Opus verdict so cheap work goes to GLM automatically. Zero token cost when you're not spawning subagents.
If you explicitly name an agent ("use opus", "use the sonnet agent", "use glm"), the hook stays silent and just routes where you asked.

Prerequisites

The Claude Code app (desktop or CLI), signed in with a subscription-based Claude(Pro / Max / Team). Your main agent uses this — no Anthropic API key needed. The claudeCLI should be on your PATH (claude --version).
Node.js ≥ 18 (node -v)
A Z.ai / Zhipu API key with GLM Coding Plan access — get one at https://z.ai.⚠️ It must be on the GLM Coding Plan (the coding-plan subscription); a generic or freekey won't have access to the coding models this uses. This is the only paid key required,and GLM is ~10× cheaper than Opus.
Git (optional, but enables glm_agent's one-command revert)

Install (recommended: global, all projects)

# from this folder:
node install.mjs --key YOUR_ZAI_API_KEY

The installer:

copies the server to ~/.claude/glm-mcp/ and runs npm install,
writes your key into ~/.claude/glm-mcp/.env,
installs the glm subagent (~/.claude/agents/glm.md) and the hook (~/.claude/hooks/),
wires the hook into ~/.claude/settings.json (backs it up first),
adds a short delegation policy to ~/.claude/CLAUDE.md,
registers the MCP server with claude mcp add glm -s user.

Then restart Claude Code and run glm_status — you should see "api_key_loaded": true.

Options: --no-register (skip the CLI step), --skip-npm, --claude-dir PATH (custom config dir).Re-running is safe (idempotent). No key on the command line? Run node install.mjs, then edit~/.claude/glm-mcp/.env and set GLM_API_KEY=....

Per-project instead of global

Don't want it everywhere? Skip the installer. Copy glm-mcp/ into your project, cd glm-mcp && npm install,copy .mcp.json.example → .mcp.json in the project root, set the key, and (optionally) copyagents/glm.md to .claude/agents/ and the hook into .claude/ + .claude/settings.json.

How it works (the short version)

You ask for something
  → Opus orchestrates
  → wants to delegate a chunk → spawns a subagent
       → [hook fires] "[GLM router] GLM-suitable repo task → use glm_agent (dry_run first)"
              (or "keep on Opus" for hard/sensitive work)
       → Opus runs glm_agent (GLM edits the files, runs tests) — or keeps it on Opus
       → you get a diff + action log + a one-command revert

The routing rules live in glm-mcp/src/router.js and the hook — not in always-on context —so they cost nothing until a subagent is actually spawned.

Routing in one line: GLM is the default (it's ~10× cheaper); Opus is the exception for workwhere being wrong is expensive — subtle debugging, architecture, large refactors, security,tool-heavy dependent loops, huge context, vision, or anything you mark sensitive.

The tools

Tool	Cost	What it does
`glm_recommend`	free (local)	GLM-or-Opus decision + model pick + reasons.
`glm_status`	free (local)	Peak window, active model, key/config health.
`glm_delegate`	GLM tokens	Text in → text out. GLM drafts; you place it.
`glm_agent`	GLM tokens	GLM works your repo directly (read/write/edit/bash). Returns a diff + action log + git revert; supports `dry_run` (propose, don't write).

Example: directly calling the GLM agent

A real run — asking GLM (via glm_agent) to write a file end-to-end on disk:

GLM agent writing a 2000-word Shakespearean essay to disk in 18 iterations for about 6 cents

Prompt: "Using the GLM agent glm_agent, write a 2000-word essay in Shakespearean format about the usefulness of an umbrella, into my Desktop."

GLM did it itself — created the file directly, no round-tripping the content through the main agent:

Output: Umbrella-Essay-Shakespeare.md — ~2,260 words of Early Modern English (thee/thou/thy, doth/hath) with two blank-verse interludes
Work: 18 tool-loop iterations; one file created, nothing existing touched
Cost: ~$0.064 — a fraction of running the same task on Opus

That's the point: the orchestrator stays on Opus while glm_agent does the heavy, file-touching work for cents.

Oversight (how you stay in control of `glm_agent`)

Entry: you/Opus choose when to call it and with which workdir.
dry_run: true: GLM proposes a full diff and writes nothing — approve, then apply.
After a real run: you get the unified diff, an action log, and a one-commandgit revert (git checkout <baseline> -- .).

Note: file/bash ops inside glm_agent run in the MCP server process (not gated per-edit) and arescoped to the workdir you pass. That's intentional (max autonomy) — point it only at repos you'refine letting it modify.

Configuration (`~/.claude/glm-mcp/.env`)

Var	Default	Meaning
`GLM_API_KEY`	—	Your Z.ai key. Required.
`GLM_BASE_URL`	`https://api.z.ai/api/anthropic`	Anthropic-compatible endpoint.
`GLM_COST_BIAS`	`1.5`	How hard to favor GLM (it's ~10× cheaper). Higher = more GLM; `0` = decide on capability only.
`GLM_CAP`	`off`	Output-token cap. Off by default = generous (up to 131072 per call). Set `on` to enforce `GLM_MAX_TOKENS` and rein in spend.
`GLM_MAX_TOKENS`	`32768`	The hard per-call limit applied only when `GLM_CAP=on`. (`max_tokens` is a ceiling, not a target — you pay for actual output.)
`GLM_MAX_TOKENS_CEILING`	`131072`	The generous default used when the cap is off.
`GLM_MAX_CONCURRENT`	`1`	GLM caps in-flight requests; keep at 1.
`GLM_OFFPEAK_MODEL` / `GLM_PEAK_MODEL`	`glm-5.2` / `glm-5.2`	Model(s) for `auto`. Each can be a comma-separated list (e.g. `glm-5.2,glm-5-turbo`) and the router auto-picks — most capable for hard tasks, cheapest for easy ones. Peak rule: when `auto` lands on a glm-5.x model (3× surcharge) the router routes less work to GLM at peak; if you include a no-surcharge model (e.g. `GLM_PEAK_MODEL=glm-5.2,glm-4.7`) it's preferred at peak and GLM stays fine to use.
`GLM_PEAK_START_CN` / `GLM_PEAK_END_CN`	`14` / `18`	Peak window (China hour, UTC+8).
`GLM_AGENT_MAX_ITERS`	`30`	Max tool-loop turns for `glm_agent`.

Full list with comments: glm-mcp/.env.example.

Uninstall

node uninstall.mjs          # remove agent, hook, settings entry, MCP registration
node uninstall.mjs --purge  # also delete ~/.claude/glm-mcp (and its .env)

Security

Never commit/share your .env or a .mcp.json containing the key. .gitignore excludes them.
GLM routes through servers in China — don't send secrets/regulated code you wouldn't send to athird-party API. (Routing keeps sensitive-flagged work on Opus, but you decide what to delegate.)

Troubleshooting

Symptom	Fix
`glm_status` missing / tools absent	Restart Claude Code; `claude mcp get glm` to confirm registration.
`api_key_loaded: false`	Set `GLM_API_KEY` in `~/.claude/glm-mcp/.env`.
Server fails to start	`cd ~/.claude/glm-mcp && npm run smoke` to see the real error.
`Too much concurrency`	Expected under load; it auto-retries. Don't fan out parallel GLM calls.
Hook not firing	Check `~/.claude/settings.json` has a `PreToolUse` `Task` matcher pointing at `glm_subagent_router.mjs`.

More background and the research behind the routing rules: see docs/.

Contributing

PRs and issues welcome — see CONTRIBUTING.md. Good first areas: routingrules (glm-mcp/src/router.js + the hook), provider adapters, and docs. Please never commitsecrets/.env.

License

MIT © djerok

_{Original / canonical repository: https://github.com/djerok/glm_mcp_claude. If you fork,mirror, or redistribute this project, please keep a link back to the source so others can findupdates, file issues, and contribute. Built by @djerok.}

GLM-as-Subagent for Claude Code — plug & play