Local Agent Screen Viewer
Real-time screen capture with Vision API for AI agents. GPU-accelerated, change detection, and an HTTP server that lets any AI agent autonomously see your screen.
Built for Windows AI automation workflows — no manual screenshots needed.
Features
Screen Viewer (Webview Panel)
- Live Desktop Stream in a VS Code panel at configurable frame rates (1-10 FPS)
- GPU-Accelerated via DXGI Desktop Duplication (
node-screenshots) — minimal CPU usage
- Automatic Fallback to PowerShell-based capture when GPU is unavailable
- Dark Theme HUD with live FPS counter, status indicator, pause/resume controls
- HD Capture button for full-resolution PNG screenshots
Vision Server (HTTP API for AI Agents)
- HTTP server on
127.0.0.1:7899 — any agent or tool can request screen frames
- Change Detection — only returns frames when the screen actually changed (saves bandwidth + tokens)
- 5 endpoints:
/frame, /frame/changed, /frame/hd, /status, /capture
- Zero user interaction — agents see the screen autonomously
- Works with any language (Python, Node.js, Rust, etc.) via simple HTTP GET
AI Integration
- AI-Ready Screenshots saved to
~/.local-agent/screenshots/
- Compatible with the Local Agent Python framework for full computer-use automation
- Frame data returned as JPEG/PNG for direct LLM vision input
Getting Started
Basic Usage (Screen Viewer)
- Open the Command Palette (
Ctrl+Shift+P)
- Run Local Agent: Iniciar Screen Viewer
- A panel opens showing your live desktop
- Hover to reveal Pause and Capture HD buttons
Vision Server (for AI Agents)
Ctrl+Shift+P > Local Agent: Iniciar Vision Server (API para Agente)
- The server starts on
http://127.0.0.1:7899
- Any process can now request frames:
# Get current screen frame
curl http://127.0.0.1:7899/frame -o screen.jpg
# Get frame only if screen changed (304 if no change)
curl http://127.0.0.1:7899/frame/changed -o screen.jpg
# Full-resolution PNG
curl http://127.0.0.1:7899/frame/hd -o screen_hd.png
# Server status
curl http://127.0.0.1:7899/status
# Force immediate capture
curl -X POST http://127.0.0.1:7899/capture -o capture.jpg
# Python example
import httpx
r = httpx.get("http://127.0.0.1:7899/frame")
with open("screen.jpg", "wb") as f:
f.write(r.content)
# Check if screen changed
r = httpx.get("http://127.0.0.1:7899/frame/changed")
if r.status_code == 200:
print("Screen changed!", len(r.content), "bytes")
elif r.status_code == 304:
print("No change")
Commands
| Command |
Description |
Local Agent: Iniciar Screen Viewer |
Opens viewer panel + starts Vision Server |
Local Agent: Parar Screen Viewer |
Stops capture and closes panel |
Local Agent: Capturar Tela para IA |
Saves high-res PNG to ~/.local-agent/screenshots/ |
Local Agent: Iniciar Vision Server |
Starts HTTP API only (no webview panel) |
Local Agent: Parar Vision Server |
Stops the HTTP Vision Server |
Settings
| Setting |
Default |
Description |
localAgent.screenViewer.fps |
5 |
Frames per second (1-10) |
localAgent.screenViewer.quality |
70 |
JPEG stream quality (1-100) |
localAgent.screenViewer.scale |
0.5 |
Image scale factor (0.1-1.0) |
localAgent.visionServer.port |
7899 |
Vision Server HTTP port |
localAgent.visionServer.diffThreshold |
0.02 |
Change detection threshold (0.001-0.5) |
Vision Server API
GET /frame
Returns the latest screen frame as JPEG.
Response Headers:
X-Frame-Timestamp — Unix timestamp (ms) of the frame
X-Frame-Changed — "true" or "false" (change detection result)
GET /frame/changed
Returns the frame only if the screen changed since the last request. Returns 304 Not Modified if unchanged.
GET /frame/hd
Captures and returns a full-resolution PNG (on demand, not from the stream).
GET /status
Returns JSON with server and capture state:
{
"server": "vision-server",
"version": "0.2.0",
"running": true,
"capture": {
"active": true,
"backend": "GPU (DXGI)",
"hasFrame": true,
"frameCount": 1234,
"changeDetected": true
}
}
POST /capture
Forces an immediate capture and returns the frame as JPEG.
Architecture
+------------------------------------------+
| VS Code Extension |
| +----------------+ +----------------+ |
| | ScreenCapture |->| ChangeDetector | |
| | (DXGI / PS) | | (pixel sample) | |
| +----------------+ +-------+--------+ |
| | |
| +---------------------------v---------+ |
| | VisionServer :7899 | |
| | GET /frame | /frame/changed | /hd | |
| | GET /status | POST /capture | |
| +---------------------------+---------+ |
+---------------------------- -|------------+
| HTTP
+-------------------v-------------------+
| Any AI Agent (Python, Node, etc.) |
| httpx.get("127.0.0.1:7899/frame") |
+---------------------------------------+
Capture Backends
| Backend |
Technology |
Performance |
| GPU (primary) |
DXGI Desktop Duplication via node-screenshots |
Fastest, minimal CPU |
| PowerShell (fallback) |
System.Drawing screen capture |
Works on all Windows |
Image resizing and JPEG encoding use sharp for optimal performance.
Requirements
- Windows 10/11 (DXGI + PowerShell capture are Windows-specific)
- VS Code 1.85+
Known Limitations
- Windows only — DXGI and PowerShell capture are Windows-specific APIs
- Primary monitor — captures the primary display only
- Vision Server — binds to
127.0.0.1 (localhost only, not exposed to network)
Contributing
Contributions are welcome! Please open an issue or pull request on GitHub.
License
MIT - Anderson Belem (Otimiza.pro)