VisionDev

Live visual + behavior debugging for AI coding assistants.

VisionDev gives Cursor, Claude Code, GitHub Copilot, and Codex a real Chromium browser they can drive while you watch. The agent reads the page, clicks, types, asserts — you see every step happen live, with action markers (click dots, fill highlights, motion trails) and a smooth in-editor video stream of the browser viewport.

Use plain English. The agent picks the right tools.

"log into localhost:3000 with admin@nextcare.com / 123456, change my phone to 20202020, make sure it saves"

Install

Install the extension from the VS Code / Cursor marketplace (or the bundled .vsix).
Run VisionDev: Connect MCP (Cursor + VS Code Copilot) from the command palette. This writes .cursor/mcp.json (Cursor) and .vscode/mcp.json (VS Code + GitHub Copilot MCP).
(Recommended) Run VisionDev: Install Agent Guidance (AGENTS.md) so the agent uses VisionDev from plain-English prompts without you naming any tool.
Reload the editor window.

That's it. The first time you ask Cursor to verify a UI flow, Chromium will open, install if needed, and start driving itself.

How it works

VisionDev exposes 8 MCP tools (stdio, no HTTP):

Tool	Purpose
`vision_open(url, device)`	Launch Chromium, navigate, return numbered interactive elements
`vision_observe()`	Re-snapshot current page; returns IDs + visible toasts/alerts/errors
`vision_act(id, action, value?)`	Click / fill / press / hover / select / clear by element ID
`vision_navigate(url)`	Same browser, new URL
`vision_wait({kind, value})`	urlContains / textVisible / selectorVisible / ms
`vision_assert({kind, ...})`	textVisible / urlContains / errorVisible / toastVisible / elementValue / elementVisible
`vision_screenshot()`	Push a frame to the panel (no bytes returned to LLM)
`vision_close()`	Close the session
`vision_check` (legacy)	Compatibility wrapper for the old single-call API

The browser stays open across tool calls, so subsequent actions are ~50-100ms each. Element IDs come from a real DOM scan (each interactive element is tagged data-vd-id) — the agent never guesses CSS selectors.

The panel mirrors the browser viewport at ~15fps via Chromium DevTools Protocol screencast. Frames go directly to the panel via local WebSocket — they never touch the LLM, so they don't cost any tokens.

Why it's reliable

No CSS selectors: agent picks elements by ID from a fresh observation
No empty-plan PASS: every failure returns failureType, nextAction, and evidence
Persistent session: one Chromium launch per debugging conversation
Element IDs invalidate explicitly so the agent always re-observes after route changes
Action markers in the live browser (green dot for clicks, blue for fills, motion trails) so you can follow what the agent is doing

Commands

Command	What it does
`VisionDev: Open Panel`	Show the activity panel and live browser mirror
`VisionDev: Connect MCP (Cursor + VS Code Copilot)`	Write `.cursor/mcp.json` and `.vscode/mcp.json` pointing at the bundled MCP server
`VisionDev: Install Agent Guidance (AGENTS.md)`	Drop a primer into your repo so Cursor uses VisionDev from plain English

Manual MCP config (advanced)

Cursor (.cursor/mcp.json): top-level mcpServers, "type": "stdio" (not transport).

VS Code + Copilot (.vscode/mcp.json): top-level servers, "type": "stdio" — reference.

{
  "mcpServers": {
    "visiondev": {
      "type": "stdio",
      "command": "node",
      "args": ["<absolute path to extension>/out/server.js"],
      "env": { "VISIONDEV_WS_PORT": "51051" }
    }
  }
}

{
  "servers": {
    "visiondev": {
      "type": "stdio",
      "command": "node",
      "args": ["<absolute path to extension>/out/server.js"],
      "env": { "VISIONDEV_WS_PORT": "51051" }
    }
  }
}

Costs

VisionDev itself is free and runs locally. Only LLM tokens consumed are the compact text snapshots returned by vision_observe (~50-100 tokens per element, ~2-5K per page). Frame streaming and action markers cost zero tokens.

License

MIT

VisionDev

Sanidhya Thakur

VisionDev

Install

How it works

Why it's reliable

Commands

Manual MCP config (advanced)

Costs

License