Computer Control

Let the agent drive your Mac — open apps, click, type, run terminal commands, and operate IDEs.

MeghaOS can take the wheel. The computer_control plugin runs an agentic loop: it takes a screenshot, decides the next action, performs it, then looks again — repeating until the task is done. This is how it operates apps, runs terminal commands, and drives AI coding tools like Cursor or Claude Code.

Warning

Computer control physically clicks, types, and runs commands on your machine. It's powerful and genuinely useful, but treat each request like handing over your keyboard. Start with small tasks to build trust.

How it works

The loop is exposed at POST /api/computer/execute and can be halted with POST /api/computer/stop.

What it can do

CapabilityTool
Open an applicationOPEN_APP
Click / double-click at a pointCLICK_AT · DOUBLE_CLICK_AT
Type textTYPE_TEXT
Press keys / shortcutsPRESS_KEY
ScrollSCROLL_AT
See the screenSCREENSHOT · GET_SCREEN_SIZE
Run a terminal commandRUN_TERMINAL_COMMAND
Drive an IDE's AI panelSTART_IDE_AGENT

Triggering it

The /chat handler detects computer-control intent before treating a request as a UI-composition query, so build-style phrasings route to the control loop. Recognized cues include:

  • Mentioning an IDE: Cursor, VS Code, Antigravity, Windsurf, Zed
  • Saying "claude code" / "use claude to…"
  • An action verb + IDE/build keyword: "open… ", "build… ", "scaffold… "

Examples:

"Open Cursor in my ~/projects/site folder and use Claude Code to add a contact form."

"Open the terminal and run the test suite."

"Use VS Code to create a new Python project with a virtualenv."

Driving AI coding tools

START_IDE_AGENT opens an IDE, navigates to your project folder, and activates its built-in AI panel (or a Claude Code terminal), then types your instructions. From there the agent and the IDE's own assistant collaborate on the task. This is the path behind requests like "use Cursor to build X."

Screen awareness (without taking control)

If you only want the agent to understand what's on screen — not operate it — use the screen-reader path instead:

  • "what's on my screen right now?"/api/screen/analyze (READ_SCREEN)
  • "insert this text where my cursor is"/api/screen/insert (INSERT_TEXT)
  • Read the focused selection → GET_FOCUSED_TEXT

Stopping a run

If a control session goes somewhere you didn't intend, stop it immediately with POST /api/computer/stop (or the stop control in the UI). The loop checks for cancellation between actions.