GLM (Zhipu/Z.ai) as a cheap, full-capability subagent for the Claude Code app — works on a subscription Claude (no API key for the main agent), auto-routes between Opus and GLM, file-editing agent with diff/dry-run/git-revert, one-command npx install.

GLM-as-Subagent for Claude Code — plug & play

📦 Canonical source: https://github.com/djerok/glm_mcp_claude — created by@djerok. If you found this via a fork, mirror, or anawesome-list, the original lives here. Please ⭐ / file issues / open PRs at the source.

Add GLM (Zhipu / Z.ai) to Claude Code as a cheap, full-capability subagent (~10× cheaperthan Opus), with automatic per-task routing between Opus and GLM. Your main agent stays onOpus; GLM does the well-specified, cost-sensitive work — and can read, write, edit, and run yourfiles directly. One command to install.

Works in the Claude Code app on a subscription-based Claude. Your main agent runs on theClaude you already pay for through the Claude Code app (Pro / Max / Team subscription) —no separate pay-per-token Anthropic API key required. Only GLM needs a (cheap) Z.ai key.Opus orchestrates on your subscription; GLM does the heavy lifting for a fraction of the cost.

The glm subagent (orchestrated by Haiku 4.5, the cheap layer) reading this repo and offloading generation to GLM

↑ The glm subagent (orchestrated by Haiku 4.5, the cheap layer) reading the repo and offloading the heavy work to GLM via the MCP tools — the Opus → Haiku → GLM hybrid in action.

# no clone needed — run straight from GitHub:
npx github:djerok/glm_mcp_claude --key YOUR_ZAI_API_KEY

# or clone and run the installer:
node install.mjs --key YOUR_ZAI_API_KEY

…then restart Claude Code. That's it. (Details below.)

🔑 Your key must be from the Z.ai / Zhipu GLM Coding Plan. Get one athttps://z.ai → subscribe to the GLM Coding Plan, then create an API key. A generic /free key without coding-plan access will not work for the coding models used here.

What you get

  • glm subagent — a full-tool subagent (read/write/edit/bash) powered by GLM.
  • glm_agent tool — GLM as a real file-editing agent with built-in oversight (diff, dry-run, git revert).
  • glm_delegate / glm_recommend / glm_status — draft-only delegation, a free routing advisor, and a health check.
  • Auto-delegation hook — when you spawn a subagent, it injects a GLM-vs-Opus verdict so cheap work goes to GLM automatically. Zero token cost when you're not spawning subagents.
  • If you explicitly name an agent ("use opus", "use the sonnet agent", "use glm"), the hook stays silent and just routes where you asked.

Prerequisites

  • The Claude Code app (desktop or CLI), signed in with a subscription-based Claude(Pro / Max / Team). Your main agent uses this — no Anthropic API key needed. The claudeCLI should be on your PATH (claude --version).
  • Node.js ≥ 18 (node -v)
  • A Z.ai / Zhipu API key with GLM Coding Plan access — get one at https://z.ai.⚠️ It must be on the GLM Coding Plan (the coding-plan subscription); a generic or freekey won't have access to the coding models this uses. This is the only paid key required,and GLM is ~10× cheaper than Opus.
  • Git (optional, but enables glm_agent's one-command revert)

Install (recommended: global, all projects)

# from this folder:
node install.mjs --key YOUR_ZAI_API_KEY

The installer:

  1. copies the server to ~/.claude/glm-mcp/ and runs npm install,
  2. writes your key into ~/.claude/glm-mcp/.env,
  3. installs the glm subagent (~/.claude/agents/glm.md) and the hook (~/.claude/hooks/),
  4. wires the hook into ~/.claude/settings.json (backs it up first),
  5. adds a short delegation policy to ~/.claude/CLAUDE.md,
  6. registers the MCP server with claude mcp add glm -s user.

Then restart Claude Code and run glm_status — you should see "api_key_loaded": true.

Options: --no-register (skip the CLI step), --skip-npm, --claude-dir PATH (custom config dir).Re-running is safe (idempotent). No key on the command line? Run node install.mjs, then edit~/.claude/glm-mcp/.env and set GLM_API_KEY=....

Per-project instead of global

Don't want it everywhere? Skip the installer. Copy glm-mcp/ into your project, cd glm-mcp && npm install,copy .mcp.json.example.mcp.json in the project root, set the key, and (optionally) copyagents/glm.md to .claude/agents/ and the hook into .claude/ + .claude/settings.json.

How it works (the short version)

You ask for something
  → Opus orchestrates
  → wants to delegate a chunk → spawns a subagent
       → [hook fires] "[GLM router] GLM-suitable repo task → use glm_agent (dry_run first)"
              (or "keep on Opus" for hard/sensitive work)
       → Opus runs glm_agent (GLM edits the files, runs tests) — or keeps it on Opus
       → you get a diff + action log + a one-command revert

The routing rules live in glm-mcp/src/router.js and the hook — not in always-on context —so they cost nothing until a subagent is actually spawned.

Routing in one line: GLM is the default (it's ~10× cheaper); Opus is the exception for workwhere being wrong is expensive — subtle debugging, architecture, large refactors, security,tool-heavy dependent loops, huge context, vision, or anything you mark sensitive.

The tools

Tool Cost What it does
glm_recommend free (local) GLM-or-Opus decision + model pick + reasons.
glm_status free (local) Peak window, active model, key/config health.
glm_delegate GLM tokens Text in → text out. GLM drafts; you place it.
glm_agent GLM tokens GLM works your repo directly (read/write/edit/bash). Returns a diff + action log + git revert; supports dry_run (propose, don't write).

Example: directly calling the GLM agent

A real run — asking GLM (via glm_agent) to write a file end-to-end on disk:

GLM agent writing a 2000-word Shakespearean essay to disk in 18 iterations for about 6 cents

Prompt: "Using the GLM agent glm_agent, write a 2000-word essay in Shakespearean format about the usefulness of an umbrella, into my Desktop."

GLM did it itself — created the file directly, no round-tripping the content through the main agent:

  • Output: Umbrella-Essay-Shakespeare.md — ~2,260 words of Early Modern English (thee/thou/thy, doth/hath) with two blank-verse interludes
  • Work: 18 tool-loop iterations; one file created, nothing existing touched
  • Cost: ~$0.064 — a fraction of running the same task on Opus

That's the point: the orchestrator stays on Opus while glm_agent does the heavy, file-touching work for cents.

Oversight (how you stay in control of glm_agent)

  • Entry: you/Opus choose when to call it and with which workdir.
  • dry_run: true: GLM proposes a full diff and writes nothing — approve, then apply.
  • After a real run: you get the unified diff, an action log, and a one-commandgit revert (git checkout <baseline> -- .).

Note: file/bash ops inside glm_agent run in the MCP server process (not gated per-edit) and arescoped to the workdir you pass. That's intentional (max autonomy) — point it only at repos you'refine letting it modify.

Configuration (~/.claude/glm-mcp/.env)

Var Default Meaning
GLM_API_KEY Your Z.ai key. Required.
GLM_BASE_URL https://api.z.ai/api/anthropic Anthropic-compatible endpoint.
GLM_COST_BIAS 1.5 How hard to favor GLM (it's ~10× cheaper). Higher = more GLM; 0 = decide on capability only.
GLM_CAP off Output-token cap. Off by default = generous (up to 131072 per call). Set on to enforce GLM_MAX_TOKENS and rein in spend.
GLM_MAX_TOKENS 32768 The hard per-call limit applied only when GLM_CAP=on. (max_tokens is a ceiling, not a target — you pay for actual output.)
GLM_MAX_TOKENS_CEILING 131072 The generous default used when the cap is off.
GLM_MAX_CONCURRENT 1 GLM caps in-flight requests; keep at 1.
GLM_OFFPEAK_MODEL / GLM_PEAK_MODEL glm-5.2 / glm-5.2 Model(s) for auto. Each can be a comma-separated list (e.g. glm-5.2,glm-5-turbo) and the router auto-picks — most capable for hard tasks, cheapest for easy ones. Peak rule: when auto lands on a glm-5.x model (3× surcharge) the router routes less work to GLM at peak; if you include a no-surcharge model (e.g. GLM_PEAK_MODEL=glm-5.2,glm-4.7) it's preferred at peak and GLM stays fine to use.
GLM_PEAK_START_CN / GLM_PEAK_END_CN 14 / 18 Peak window (China hour, UTC+8).
GLM_AGENT_MAX_ITERS 30 Max tool-loop turns for glm_agent.

Full list with comments: glm-mcp/.env.example.

Uninstall

node uninstall.mjs          # remove agent, hook, settings entry, MCP registration
node uninstall.mjs --purge  # also delete ~/.claude/glm-mcp (and its .env)

Security

  • Never commit/share your .env or a .mcp.json containing the key. .gitignore excludes them.
  • GLM routes through servers in China — don't send secrets/regulated code you wouldn't send to athird-party API. (Routing keeps sensitive-flagged work on Opus, but you decide what to delegate.)

Troubleshooting

Symptom Fix
glm_status missing / tools absent Restart Claude Code; claude mcp get glm to confirm registration.
api_key_loaded: false Set GLM_API_KEY in ~/.claude/glm-mcp/.env.
Server fails to start cd ~/.claude/glm-mcp && npm run smoke to see the real error.
Too much concurrency Expected under load; it auto-retries. Don't fan out parallel GLM calls.
Hook not firing Check ~/.claude/settings.json has a PreToolUse Task matcher pointing at glm_subagent_router.mjs.

More background and the research behind the routing rules: see docs/.

Contributing

PRs and issues welcome — see CONTRIBUTING.md. Good first areas: routingrules (glm-mcp/src/router.js + the hook), provider adapters, and docs. Please never commitsecrets/.env.

License

MIT © djerok

Original / canonical repository: https://github.com/djerok/glm_mcp_claude. If you fork,mirror, or redistribute this project, please keep a link back to the source so others can findupdates, file issues, and contribute. Built by @djerok.

MCP Server · Populars

MCP Server · New

    abskrj

    velane

    Code Runtime and iPaaS for AI Agent — execute Bun/Python snippets at scale via POST API + integrate with 800+ tools (N8N for AI Agents)

    Community abskrj
    jean-technologies

    Jean Memory

    next-generation AI memory infrastructure (powered by mem0 and graphiti)

    Community jean-technologies
    PascaleBeier

    HitKeep

    HitKeep is privacy-first analytics for humans and AI agents, self-hosted or in managed EU/US cloud regions.

    Community PascaleBeier
    prometheus

    Prometheus MCP Server

    MCP server for LLMs to interact with Prometheus

    Community prometheus
    TencentEdgeOne

    EdgeOne Makers MCP

    An MCP service designed for deploying HTML content to EdgeOne Pages and obtaining an accessible public URL.

    Community TencentEdgeOne