sandraschi

windows-computer-use-mcp

Community sandraschi
Updated

Windows Computer Use for AI Agents. Both a tool (22 MCP tools for click, type, screenshot, OCR, UI inspection) and an agent (autonomous mission engine, macro recorder, intent-based discovery, event watchers). Built with opencode (DeepSeek V4). Ships as MCP server, web UI, and Tauri desktop app.

windows-computer-use-mcp

A tool for agents, and an agent itself.

You It
Use it as an MCP server Claude, Cursor, DeepSeek call automation_click, automation_screenshot, automation_ocr — 22 tools
Use it as an autonomous agent Give it a goal: automation_mission(run="install app, verify UI, screenshot result") — it plans, executes, retries, and reports
Use it as a webapp start.ps1 opens a React dashboard at http://127.0.0.1:10788 with HITL, crawler, logging
Use it as a desktop app The NSIS installer bundles everything into one binary — no Python, no uv, no git needed

Exhibit A: 100 Tauri/NSIS installers, one unattended run, $2 in LLM costs. Install, screenshot, verify, report — zero human intervention. That is what agentic Windows automation looks like at scale.

Built on pywinauto. Read docs/SAFETY.md before production use.

  • Quick Start
  • Features
  • Documentation
  • Ports
  • License

Quick Start

Method Command / Config
MCP stdio (Cursor, Claude Desktop) { "mcpServers": { "windows-computer-use": { "command": "uv", "args": ["--directory", "<PATH>", "run", "windows-computer-use-mcp"] } } }
HTTP streamable (any MCP HTTP client) { "mcpServers": { "windows-computer-use": { "url": "http://127.0.0.1:10789/mcp" } } }
Web operator UI .\start.ps1http://127.0.0.1:10788
Desktop app (NSIS installer) Download from Releases — zero deps

See INSTALL.md for detailed setup. Run just demo for examples.

Features

  • Window Management — find, activate, maximize, minimize, position, close
  • Mouse & Keyboard — click, drag, type, hotkeys, app shortcuts
  • UI Elements — inspect, click, read text, verify state via UIA / Win32
  • Visual Intelligence — screenshots, OCR, template matching
  • Autonomous Missions — give it a goal, it plans and executes with retry + verification
  • Macro Recording — record any UI sequence, replay, verify outcomes
  • Multi-App Workflows — chain actions across Notepad, Calc, Paint, or any Windows app
  • Telemetry — every action logged to SQLite; query failure patterns by tool
  • Adaptive Location — auto-cascades through title/auto_id/control_id/class/OCR to find elements
  • Face Recognition — optional, off by default

Documentation

Doc Content
INSTALL.md Setup: desktop app, uv, MCP config
docs/README.md Full documentation hub
docs/py-stack.md Python dependency deep dive
docs/composing-with-playwright.md Browser automation with Playwright MCP
docs/ocr.md OCR system — Tesseract setup, limitations, competition
docs/cua-nsis-certification.md Dogfooding: using the tool to test its own NSIS installer
docs/SAFETY.md HITL, kill switch, opt-in features
docs/TOOLS.md Portmanteau tool reference
tests/README.md Test suite guide and e2e setup
examples/README.md Runnable demos
mcpb/README.md MCPB bundle packaging
web_sota/README.md Operator UI build/dev guide
CHANGELOG.md Release history

Ports

Port Service
10788 Frontend — Vite operator UI
10789 Backend — FastAPI + FastMCP HTTP
stdio MCP transport (port-free)

Related

Repo What it does
autohotkey-mcp Raw input recording/replay via AHK
browser-mcp Playwright browser control — for webapps, HTML DOM, websites
virtualization-mcp Sandbox / VM isolation
windows-operations-mcp Registry, services, accounts

Browser vs desktop: This server drives Win32 / UI Automation. For HTML/DOM and websites, pair with browser-mcp (Playwright). Both MCPs can run side by side — use one profile that loads both and let the LLM pick the right tool for the target.

Fleet standards: mcp-central-docs.

License

MIT — Copyright (c) 2026 Sandra Schipal.

MCP Server · Populars

MCP Server · New

    jackccrawford

    Geniuz

    Your AI remembers now. Geniuz stores everything in a local database locally on Mac, Windows, Linux, Raspberry Pi. No cloud. No account. No API keys. Nothing leaves your machine. It's open source; you can read every line of code.

    Community jackccrawford
    ggui-ai

    ggui

    The universal interface layer between AI agents and humans. Generate rich UIs on demand via MCP.

    Community ggui-ai
    aanno

    CocoIndex Code MCP Server

    An RAG for code development, implemented as MCP server with cocoindex

    Community aanno
    timescale

    Tiger Linear MCP Server

    A wrapper around the Linear API for internal LLMs

    Community timescale
    choplin

    MCP Gemini CLI

    MCP Server

    Community choplin