Local Agent Screen Viewer

Real-time screen capture with Vision API for AI agents. GPU-accelerated, change detection, and an HTTP server that lets any AI agent autonomously see your screen.

Built for Windows AI automation workflows — no manual screenshots needed.

Features

Screen Viewer (Webview Panel)

Live Desktop Stream in a VS Code panel at configurable frame rates (1-10 FPS)
GPU-Accelerated via DXGI Desktop Duplication (node-screenshots) — minimal CPU usage
Automatic Fallback to PowerShell-based capture when GPU is unavailable
Dark Theme HUD with live FPS counter, status indicator, pause/resume controls
HD Capture button for full-resolution PNG screenshots

Vision Server (HTTP API for AI Agents)

HTTP server on 127.0.0.1:7899 — any agent or tool can request screen frames
Change Detection — only returns frames when the screen actually changed (saves bandwidth + tokens)
5 endpoints: /frame, /frame/changed, /frame/hd, /status, /capture
Zero user interaction — agents see the screen autonomously
Works with any language (Python, Node.js, Rust, etc.) via simple HTTP GET

AI Integration

AI-Ready Screenshots saved to ~/.local-agent/screenshots/
Compatible with the Local Agent Python framework for full computer-use automation
Frame data returned as JPEG/PNG for direct LLM vision input

Getting Started

Basic Usage (Screen Viewer)

Open the Command Palette (Ctrl+Shift+P)
Run Local Agent: Iniciar Screen Viewer
A panel opens showing your live desktop
Hover to reveal Pause and Capture HD buttons

Vision Server (for AI Agents)

Ctrl+Shift+P > Local Agent: Iniciar Vision Server (API para Agente)
The server starts on http://127.0.0.1:7899
Any process can now request frames:

# Get current screen frame
curl http://127.0.0.1:7899/frame -o screen.jpg

# Get frame only if screen changed (304 if no change)
curl http://127.0.0.1:7899/frame/changed -o screen.jpg

# Full-resolution PNG
curl http://127.0.0.1:7899/frame/hd -o screen_hd.png

# Server status
curl http://127.0.0.1:7899/status

# Force immediate capture
curl -X POST http://127.0.0.1:7899/capture -o capture.jpg

# Python example
import httpx

r = httpx.get("http://127.0.0.1:7899/frame")
with open("screen.jpg", "wb") as f:
    f.write(r.content)

# Check if screen changed
r = httpx.get("http://127.0.0.1:7899/frame/changed")
if r.status_code == 200:
    print("Screen changed!", len(r.content), "bytes")
elif r.status_code == 304:
    print("No change")

Commands

Command	Description
`Local Agent: Iniciar Screen Viewer`	Opens viewer panel + starts Vision Server
`Local Agent: Parar Screen Viewer`	Stops capture and closes panel
`Local Agent: Capturar Tela para IA`	Saves high-res PNG to `~/.local-agent/screenshots/`
`Local Agent: Iniciar Vision Server`	Starts HTTP API only (no webview panel)
`Local Agent: Parar Vision Server`	Stops the HTTP Vision Server

Settings

Setting	Default	Description
`localAgent.screenViewer.fps`	`5`	Frames per second (1-10)
`localAgent.screenViewer.quality`	`70`	JPEG stream quality (1-100)
`localAgent.screenViewer.scale`	`0.5`	Image scale factor (0.1-1.0)
`localAgent.visionServer.port`	`7899`	Vision Server HTTP port
`localAgent.visionServer.diffThreshold`	`0.02`	Change detection threshold (0.001-0.5)

Vision Server API

`GET /frame`

Returns the latest screen frame as JPEG.

Response Headers:

X-Frame-Timestamp — Unix timestamp (ms) of the frame
X-Frame-Changed — "true" or "false" (change detection result)

`GET /frame/changed`

Returns the frame only if the screen changed since the last request. Returns 304 Not Modified if unchanged.

`GET /frame/hd`

Captures and returns a full-resolution PNG (on demand, not from the stream).

`GET /status`

Returns JSON with server and capture state:

{
  "server": "vision-server",
  "version": "0.2.0",
  "running": true,
  "capture": {
    "active": true,
    "backend": "GPU (DXGI)",
    "hasFrame": true,
    "frameCount": 1234,
    "changeDetected": true
  }
}

`POST /capture`

Forces an immediate capture and returns the frame as JPEG.

Architecture

+------------------------------------------+
|  VS Code Extension                       |
|  +----------------+  +----------------+  |
|  | ScreenCapture  |->| ChangeDetector |  |
|  | (DXGI / PS)    |  | (pixel sample) |  |
|  +----------------+  +-------+--------+  |
|                              |            |
|  +---------------------------v---------+  |
|  | VisionServer :7899                  |  |
|  | GET /frame | /frame/changed | /hd   |  |
|  | GET /status | POST /capture         |  |
|  +---------------------------+---------+  |
+---------------------------- -|------------+
                               | HTTP
           +-------------------v-------------------+
           |  Any AI Agent (Python, Node, etc.)    |
           |  httpx.get("127.0.0.1:7899/frame")   |
           +---------------------------------------+

Capture Backends

Backend	Technology	Performance
GPU (primary)	DXGI Desktop Duplication via `node-screenshots`	Fastest, minimal CPU
PowerShell (fallback)	`System.Drawing` screen capture	Works on all Windows

Image resizing and JPEG encoding use sharp for optimal performance.

Requirements

Windows 10/11 (DXGI + PowerShell capture are Windows-specific)
VS Code 1.85+

Known Limitations

Windows only — DXGI and PowerShell capture are Windows-specific APIs
Primary monitor — captures the primary display only
Vision Server — binds to 127.0.0.1 (localhost only, not exposed to network)

Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

License

MIT - Anderson Belem (Otimiza.pro)