Computer Control
Let the agent drive your Mac — open apps, click, type, run terminal commands, and operate IDEs.
MeghaOS can take the wheel. The computer_control plugin runs an agentic loop: it takes
a screenshot, decides the next action, performs it, then looks again — repeating until the
task is done. This is how it operates apps, runs terminal commands, and drives AI coding
tools like Cursor or Claude Code.
Computer control physically clicks, types, and runs commands on your machine. It's powerful and genuinely useful, but treat each request like handing over your keyboard. Start with small tasks to build trust.
How it works
The loop is exposed at POST /api/computer/execute and can be
halted with POST /api/computer/stop.
What it can do
| Capability | Tool |
|---|---|
| Open an application | OPEN_APP |
| Click / double-click at a point | CLICK_AT · DOUBLE_CLICK_AT |
| Type text | TYPE_TEXT |
| Press keys / shortcuts | PRESS_KEY |
| Scroll | SCROLL_AT |
| See the screen | SCREENSHOT · GET_SCREEN_SIZE |
| Run a terminal command | RUN_TERMINAL_COMMAND |
| Drive an IDE's AI panel | START_IDE_AGENT |
Triggering it
The /chat handler detects computer-control intent before treating a request as a
UI-composition query, so build-style phrasings route to the control loop. Recognized cues
include:
- Mentioning an IDE: Cursor, VS Code, Antigravity, Windsurf, Zed
- Saying "claude code" / "use claude to…"
- An action verb + IDE/build keyword: "open… ", "build… ", "scaffold… "
Examples:
"Open Cursor in my ~/projects/site folder and use Claude Code to add a contact form."
"Open the terminal and run the test suite."
"Use VS Code to create a new Python project with a virtualenv."
Driving AI coding tools
START_IDE_AGENT opens an IDE, navigates to your project folder, and activates its built-in
AI panel (or a Claude Code terminal), then types your instructions. From there the agent and
the IDE's own assistant collaborate on the task. This is the path behind requests like
"use Cursor to build X."
Screen awareness (without taking control)
If you only want the agent to understand what's on screen — not operate it — use the screen-reader path instead:
- "what's on my screen right now?" →
/api/screen/analyze(READ_SCREEN) - "insert this text where my cursor is" →
/api/screen/insert(INSERT_TEXT) - Read the focused selection →
GET_FOCUSED_TEXT
Stopping a run
If a control session goes somewhere you didn't intend, stop it immediately with
POST /api/computer/stop (or the stop control in the UI). The loop checks for cancellation
between actions.