TripleG3 Windows User
A Visual Studio Code extension that exposes focus-preserving Windows desktop interaction tools to Copilot-compatible language model and MCP flows.
The extension contributes an @windowsUser chat participant, VS Code language model tools, and a local authenticated MCP server provider for:
- Capturing all monitors or a selected monitor as an image.
- Reading monitor layout and cursor position.
- Moving and clicking the mouse as the primary way to navigate visible UI.
- Holding/releasing mouse buttons and dragging for sliders, selection, resizing, and drag-and-drop.
- Pressing keyboard chords such as
Ctrl+L, Tab, Enter, and F5 when a shortcut, focused control, or keyboard-only path is necessary.
- Typing text into the focused application.
- Reading or setting text clipboard contents for scoped copy/paste workflows.
- Opening browser/application URIs.
- Listing Microsoft Edge profiles and opening Edge with a named profile such as
Crystal.
- Listing, focusing, maximizing, and capturing specific desktop windows.
- Capturing low-cost visual context from a window, screen, or region as downscaled JPEG by default.
- Returning DPI-aware screen/window metadata, image scale, and cursor screen/capture/image coordinates for precise targeting.
- Moving or resizing desktop windows into known positions before capture or interaction.
- Running ordered action sequences in one tool call for deterministic multi-step workflows.
- Scrolling browser pages, grids, and other scrollable panes with the mouse wheel.
- Waiting for UI transitions.
- Tuning Copilot behavior with configurable operating modes, visual-context defaults, URI schemes, browser-profile hints, and extra model instructions.
- Discovering installed capabilities through a VS Code Get started with Windows User walkthrough, Command Palette commands, and AI-focused reference documentation.
- Discovering the same
windowsUser_* tool family directly from Copilot/agent-mode MCP tool discovery after the extension activates.
The @windowsUser chat participant uses a monochrome VS Code-style icon at resources/windows-user.svg. It combines Triple G3 branding with a Windows monitor/control motif and is suitable for a future Activity Bar view if one is added.
Publisher
Published by TripleG3 (GitHub). TripleG3 describes its work as software architecture, consulting, and solutions, with public projects and showcases across .NET, C#, Azure, MAUI, Blazor, developer tooling, and Windows-focused automation libraries.
TripleG3's public site tagline is Creative Engineers, supporting a modern world with innovative technology.
Important safety model
This extension can view and control the local Windows desktop. That is powerful and potentially risky, so it is intentionally designed with guardrails:
- The extension does not show its own focus-stealing confirmation popups for tool calls; use it only in trusted workflows where Copilot-compatible automation is expected to act directly.
- VS Code, Copilot, MCP hosts, browsers, Windows, websites, and security surfaces may still show their own approval, login, MFA, CAPTCHA, UAC, or protocol-handler prompts; those should not be bypassed.
- The extension is disabled in untrusted workspaces.
- It does not expose arbitrary shell command execution.
- It cannot bypass authentication, MFA, CAPTCHA, access controls, or site terms.
- Screen captures may include private information; request captures only when appropriate.
- Text entry uses clipboard paste for reliability and restores the previous text clipboard by default.
- Clipboard reads and writes can expose secrets or private data, so keep workflows scoped and observable.
Privacy and data handling
TripleG3 Windows User is designed to run locally inside Visual Studio Code and not collect information about you.
- The extension itself does not collect, sell, or store user information.
- The extension does not add custom telemetry or background data collection.
- The extension does not proactively push screen captures, clipboard contents, window details, typed text, or other local data anywhere on its own.
- The extension only responds to explicit VS Code extension APIs and language model tool calls that are available inside Visual Studio Code.
- Copilot-compatible AI features can request these tools when you use the
@windowsUser participant or related AI workflows. Windows User does not show extension-owned confirmation dialogs, so requested results may be provided back to the active VS Code chat or AI experience as part of that workflow. How that AI service processes, retains, or uses information after it receives the result is controlled by that AI service, its policies, and your account/configuration, not by this extension.
Review prompts and visible UI state carefully before asking the extension to act, especially when private information, secrets, customer data, regulated data, or other sensitive content is visible on screen or present in the clipboard.
Copilot-first operating model
@windowsUser is optimized for Copilot and compatible VS Code language model flows. The participant receives operating instructions that encourage a safe visual automation loop:
- Inspect environment, windows, or a focused visual region.
- Plan one small next action from the latest bounds and cursor metadata.
- Act with URI/window targeting, then mouse-first navigation; use keyboard, clipboard, or wait tools only when they are the right next step.
- Capture visual context again to verify the result.
- Summarize observations, actions, and uncertainty.
Mouse-first navigation rule
For AI-driven desktop work, treat the extension like a careful user with eyes and a pointer:
- Prefer visible mouse actions for navigation: click buttons, links, tabs, menus, checkboxes, and list rows; use the mouse wheel inside the intended scrollable pane; drag sliders, splitters, selections, and drop targets.
- Use keyboard actions only when they are necessary or clearly better: entering text into an already-focused field, intentional shortcuts such as
Ctrl+L, Ctrl+F, Escape, or F5, activating a known focused control with Enter/Space, or recovering from UI that cannot be reliably targeted by mouse.
- Avoid blind
Tab/Enter key-walking through pages or dialogs when a visible control can be clicked. If the click target is uncertain, capture a tighter region, use cursor metadata, or ask for clarification.
- After each meaningful click, scroll, drag, or keyboard action, wait briefly and capture visual context again before deciding the next action.
Use windowsUser.modelBehavior.mode to choose careful, balanced, or fast guidance. Add windowsUser.additionalModelInstructions for local workflow hints such as preferred browser profile, monitors to avoid, application URLs, or organization-specific UI habits. These extra instructions are appended to the Copilot participant prompt but do not override safety rules.
For a deeper option guide, prompt patterns, and recommended recipes, see docs/COPILOT_AUTOMATION_GUIDE.md.
For a compact AI-facing capability map with exact tool names, command IDs, coordinate rules, and starter prompts, see docs/AI_TOOL_REFERENCE.md.
Installed VS Code surfaces for AI
When installed, TripleG3 Windows User embeds into VS Code through these discoverable surfaces:
| Surface |
Name or ID |
How it helps |
| Chat participant |
@windowsUser |
Gives Copilot-compatible models a focused desktop-automation persona and operating instructions. |
| MCP server provider |
TripleG3 Windows User / windowsUser_* |
Lets VS Code Copilot Chat and agent-mode MCP discovery expose the installed extension as a callable local tool server, so models do not need a separate manually configured MCP file. |
| Language model tools |
windowsUser_* / references such as captureVisualContext, runActionSequence, and setWindowBounds |
Lets extension chat participants and VS Code language model flows request screen, window, mouse, keyboard, clipboard, browser, wait, and batch actions without Windows User modal popups. |
| Getting Started walkthrough |
Get started with Windows User |
Guides newly installed users through status checks, AI usage docs, and Copilot Chat prompts. |
| Command Palette |
Windows User: Show Status |
Verifies activation, monitor layout, cursor position, DPI metadata, and output-channel details. |
| Command Palette |
Windows User: Open Microsoft Edge Profile... |
Lets users manually validate Edge profile discovery/launching without Copilot. |
| Command Palette |
Windows User: Open Copilot Automation Guide |
Opens the AI usage guide from the installed extension package. |
| Settings |
windowsUser.* |
Controls model behavior guidance, visual-context defaults, URI schemes, clipboard limits, and workflow hints. |
The extension is a local UI extension (extensionKind: ui) and is explicitly disabled for untrusted and virtual workspaces because it controls the local Windows desktop.
Use after installation
- Install the Windows-targeted VSIX or Marketplace pre-release package.
- Open a trusted local workspace in VS Code on Windows.
- In VS Code Getting Started, open Get started with Windows User.
- Run Windows User: Show Status to verify local desktop access.
- Open Copilot Chat and either begin prompts with
@windowsUser or enable/attach the TripleG3 Windows User MCP tools from the chat tool picker when using agent mode.
- If Copilot says the extension is installed but no tool/API is exposed, reload VS Code once so the MCP provider is activated, then check the chat tool picker for TripleG3 Windows User. The MCP path requires a VS Code/Copilot build that supports extension-provided MCP server definition providers; older builds can still use
@windowsUser.
- Start with observation-only prompts before actions that capture pixels, move the mouse, type, read the clipboard, or open URIs.
Windows User does not show its own per-tool confirmation popups. VS Code, Copilot, MCP hosts, browsers, Windows, websites, and security surfaces may still show their own approval, login, MFA, CAPTCHA, UAC, or protocol-handler prompts; those should not be bypassed.
Try it from source
Open this folder in VS Code.
Run npm install.
Run npm run compile.
Press F5 to launch the Extension Development Host.
Run Windows User: Show Status from the Command Palette to verify the extension can activate and read monitor/cursor metadata.
Open Copilot Chat in the development host.
Ask the participant something like:
@windowsUser open https://example.com, capture the screen, and tell me what is visible
For JIRA workflows, set windowsUser.defaultJiraUrl in VS Code settings or include the JIRA URL in your prompt:
@windowsUser Open JIRA at https://your-company.atlassian.net, go to the current sprint board, filter to me, and summarize the visible issues.
The participant will use small steps: open URI, wait, capture visual context, click or scroll visible UI with the mouse first, type only after focusing the intended field, then report back.
For browser-heavy workflows, the participant can prefer window-aware and low-cost visual actions over full-desktop guessing: list matching windows, focus the right browser/profile window, capture a downscaled JPEG of only that window or a region, click visible page controls with the mouse, scroll inside the relevant pane, and capture again. Keyboard shortcuts remain available for deliberate actions such as the address bar, search/find, refresh, escape, or focused text entry.
For deterministic setup workflows, the participant can use runActionSequence to reduce round trips. For example, it can open a URI/profile, wait for the app to load, list or focus the matching window, move/resize the window to known bounds, and then capture visual context in a single approved sequence. The participant should still avoid batching steps when it needs to inspect a fresh screenshot before deciding the next action.
For UI workflows that require regular pointer behavior, the participant can drag the mouse directly or use explicit mouse-button down/up actions. Prefer dragMouse for most drags because it always releases the button in a finally block.
Build and install Windows-targeted VSIX packages
This extension is published as Windows-targeted VSIX packages because the runtime tools depend on the local Windows desktop. The default package script builds both Windows x64 and Windows Arm64 artifacts.
npm install
npm run package
Install the generated package that matches your Windows architecture into VS Code:
code --install-extension .\tripleg3-windows-user-win32-x64-0.0.8.vsix
For Windows on Arm64, install tripleg3-windows-user-win32-arm64-0.0.8.vsix instead.
Reload VS Code, open Copilot Chat, and use @windowsUser or attach/enable the TripleG3 Windows User MCP tools in agent mode.
The first Marketplace release is planned as a beta using VS Code Marketplace's Pre-Release channel. To build beta/pre-release VSIX artifacts locally, use npm run package:pre-release or the alias npm run package:beta.
For a quick non-Copilot smoke test, run Windows User: Show Status from the Command Palette. It should show a notification with the monitor count and cursor position, plus detailed JSON in the Windows User output channel.
To test named profile launching, run Windows User: Open Microsoft Edge Profile... from the Command Palette, choose a profile such as Crystal, and enter a URL.
Publish checklist
Before publishing to the Visual Studio Marketplace or an internal extension feed:
- Confirm the
publisher in package.json matches an existing VS Code Marketplace publisher.
- Complete the Marketplace publisher and GitHub secret setup in
docs/MARKETPLACE_PUBLISHING_SETUP.md.
- Run
npm run compile.
- Run
npm run package:beta and install the generated Windows-targeted .vsix locally.
- Validate screen capture, window capture/focus, URI opening, keyboard, mouse, scrolling, text entry, and clipboard workflows without Windows User confirmation popups.
- Review
SECURITY.md for high-trust local automation expectations.
- Update
CHANGELOG.md for the release.
- First release strategy: publish as a beta through the VS Code Marketplace Pre-Release channel. Keep
preview: true and use npm run publish:pre-release or npm run publish:beta after authenticating vsce with a publisher token. The script publishes win32-x64 and win32-arm64 targets.
- Until the GitHub Actions
VSCE_PAT secret is configured, use npm run package:pre-release locally and upload the generated Windows-targeted VSIX artifacts manually through the Marketplace publisher portal.
- The GitHub Actions Marketplace workflow stamps CI builds to a unique patch version before packaging and publishing, so pushes and manual workflow runs do not republish an already-used Marketplace version.
- Do not publish a stable release until the beta/pre-release has been validated. When graduating to stable, update the changelog/version strategy and publish without
--pre-release.
- The GitHub Actions workflow builds and packages pull requests, and automatically publishes the beta on pushes to
main after repository secret VSCE_PAT is configured.
Additional manual validation steps are in docs/TESTING.md.
Configuration
| Setting |
Default |
Purpose |
windowsUser.maxToolTurns |
12 |
Maximum tool-calling turns for one @windowsUser participant request. |
windowsUser.modelBehavior.mode |
balanced |
Inject careful, balanced, or fast operating guidance into the Copilot participant. |
windowsUser.additionalModelInstructions |
[] |
Append custom model guidance such as preferred accounts, app URLs, or monitors to avoid. |
windowsUser.defaultJiraUrl |
Empty string |
Optional Jira URL for prompts that say "open Jira" without a URL. |
windowsUser.defaultEdgeProfile |
Empty string |
Optional Edge profile display name, account, or directory for browser workflows when the user does not name a profile. |
windowsUser.restoreClipboardAfterType |
true |
Restore prior text clipboard after text-entry actions. |
windowsUser.visualContext.defaultSource |
window |
Default captureVisualContext source when the model omits one. |
windowsUser.visualContext.defaultFormat |
jpeg |
Default visual-context image format. JPEG is smaller; PNG is lossless. |
windowsUser.visualContext.jpegQuality |
60 |
Default JPEG quality for visual-context captures. |
windowsUser.visualContext.maxWidth |
1280 |
Default maximum output width for visual-context captures; 0 disables width downscaling. |
windowsUser.visualContext.maxHeight |
720 |
Default maximum output height for visual-context captures; 0 disables height downscaling. |
windowsUser.visualContext.includeCursor |
false |
Draw a small cursor crosshair overlay when the cursor is inside visual-context captures. Cursor metadata is always returned. |
windowsUser.clipboard.maxReadCharacters |
4000 |
Default maximum clipboard text returned when the model omits a limit. |
windowsUser.textEntry.previewCharacters |
80 |
Maximum text-entry or clipboard-write characters included in generated tool metadata; 0 hides previews. |
windowsUser.wait.defaultMilliseconds |
1000 |
Default wait duration when the wait tool omits a duration. |
windowsUser.wait.maximumMilliseconds |
30000 |
Maximum accepted wait duration; longer requests are clamped. |
windowsUser.openUri.allowedSchemes |
http, https, file, mailto, ms-edge, msteams, slack, zoommtg |
URI schemes the generic openUri tool may launch. |
windowsUser.openUri.launchMode |
auto |
auto tries direct Windows launching first and falls back to VS Code external handling; direct avoids VS Code external handling; external keeps the legacy vscode.env.openExternal path. |
Visual context strategy
Use captureVisualContext for live visual feedback loops. By default, it uses the settings above to prefer:
- foreground or matching window capture instead of the whole desktop;
- JPEG output instead of PNG;
- downscaling to fit within
1280x720;
- optional region capture when only part of a screen or window matters;
- cursor screen coordinates, capture-relative coordinates, image-relative coordinates, and optional cursor crosshair overlay;
- per-monitor DPI metadata and DPI-awareness status to help align screenshots, window bounds, and mouse coordinates on scaled displays;
- metadata-only capture with
includeImage: false when the model only needs bounds, cursor, or window details.
Use full captureScreen, lossless PNG, larger dimensions, or smaller high-fidelity regions only when text readability or exact pixel inspection requires it.
Limitations
- The participant relies on screen captures and UI automation. Native APIs, browser APIs, or JIRA REST APIs are more reliable when available.
- Visual interpretation depends on the selected Copilot language model's image support.
- Some elevated/admin windows may ignore non-elevated input.
- Secure desktop prompts, UAC, password managers, and browser security surfaces may not be controllable.
- Some host-level VS Code/Copilot/MCP tool approval prompts are controlled by the host, not by Windows User.
- Windows display scaling can affect pixel coordinates; the tools request per-monitor DPI awareness and report DPI metadata, physical screen bounds, image scale, and cursor coordinates to help the model target clicks.
- JPEG visual context is intentionally lossy; use PNG or a smaller region when text is too blurry.
| |