ApplicationController
An MCP server that exposes GUI automation tools to any MCP-compatible client (e.g. GitHub Copilot desktop app). It uses PyAutoGUI to control the mouse, keyboard, and screen of the local machine.
Tools
Screenshot & observation
| Tool | Description |
|---|---|
take_screenshot |
Captures a full screenshot (optional scale to reduce size) |
capture_region |
Captures a region by coordinates or automation_id bounding box |
get_screen_size |
Returns the screen resolution |
get_mouse_position |
Returns current cursor position |
list_windows |
Lists visible top-level windows (hwnd, title, pid, rect, isForeground) |
Mouse & keyboard
| Tool | Description |
|---|---|
click |
Left/right/middle click at (x, y) or by automation_id |
double_click |
Double-click at (x, y) |
move_mouse |
Move cursor without clicking |
drag |
Click and drag between two points |
scroll |
Scroll up or down at (x, y) |
type_text |
Type ASCII text (use clipboard + press_key for Unicode) |
press_key |
Press a key or hotkey combination (e.g. ['ctrl', 'c']) |
UIAutomation (Tier 1 — preferred over pixel clicks)
| Tool | Description |
|---|---|
ua_dump_tree |
Dump the full control tree of a window as JSON |
ua_find |
Find controls by automationId, name, or type |
ua_invoke |
Invoke (click) a control by automationId — works off-screen |
ua_set_value |
Set TextBox/ComboBox value via ValuePattern |
ua_get_value |
Read a control's current value |
Eventful waiting (Tier 2 — replace fixed sleeps)
| Tool | Description |
|---|---|
wait_for_window |
Block until a window appears or disappears |
wait_for_element |
Block until a control becomes visible/enabled/exists |
wait_input_idle |
Block until a process finishes rendering |
Window management (Tier 3)
| Tool | Description |
|---|---|
focus_window |
Bring a window to the foreground / restore if minimised |
Quality-of-life (Tier 4)
| Tool | Description |
|---|---|
launch_app |
Launch an .exe by path |
sleep |
Pause for N seconds |
find_image_on_screen |
Locate a template image on screen (requires opencv-python) |
auto_dismiss_dialog |
Find a dialog by title and click a button to dismiss it |
batch |
Execute multiple tool calls in one round-trip |
Setup
1. Install dependencies
pip install -r requirements.txt
To enable the find_image_on_screen tool, also install OpenCV:
pip install opencv-python
Or install everything at once via the package extras:
pip install ".[image-search]"
Note:
pywinautoandpywin32are included inrequirements.txt. They enable all UIAutomation tools (ua_*), window management (list_windows,focus_window,wait_for_window), andwait_input_idle. The server starts and the basic mouse/keyboard tools work without them — those tools will return a clear error message if called when the packages are absent.
2. Add to GitHub Copilot desktop app
In your Copilot settings, add an MCP server entry pointing to this server:
{
"mcpServers": {
"application-controller": {
"command": "python",
"args": ["C:\\path\\to\\ApplicationController\\server.py"]
}
}
}
Replace C:\\path\\to\\ApplicationController with the actual path where you cloned this repo.
3. Safety note
PyAutoGUI's failsafe is enabled — move your mouse to the top-left corner of the screen at any time to abort an automation sequence.