UAI Copilot Cluster Intelligence (VS Code Extension)
A VS Code extension that connects to the UAI Copilot backend and provides cluster intelligence, diagnostics, and operational guidance directly from within the editor.
Versioning policy: publish flow always increments from Marketplace by exactly +0.0.1.
Shared Backend Contract (UAI-Extension + UAI-Android)
The extension and Android client intentionally share the same backend API surface for health/system/cluster telemetry.
- One backend, two clients.
- Endpoint set is shared by design to keep behavior and diagnostics consistent across desktop and mobile.
- Extension status/tooltip now shows cluster average ping across online nodes and local node ping separately.
For operator access, the wallet panel shows an Open Admin Panel button for the UAI-Leader account only.
Getting Started
- Install the extension from the VSIX or via the Marketplace (once published).
- Ensure the backend API is running and accessible (default:
http://127.0.0.1:8000).
- Open the extension from the command palette:
UAI Copilot: Open.
- For cluster/local LLM chat, open UAI Chat from the activity bar or
UAI Copilot: Open UAI Chat (uses uaiCopilot.chat.* + OpenAI-compatible completions).
Offline-first (no public internet)
The default API URL is loopback (http://127.0.0.1:8000). After container images are on disk, you can run the stack and extension without outbound internet: start the backend from the automation tool repository root with docker compose up -d (see root docker-compose.yml). Wallet/demo auth use VS Code local storage and do not require cloud sign-in. Optional LAN discovery only scans private IPv4 subnets on your network. See modules/cluster_intelligence/README.md for how this package maps to UAI-Cluster-Intelligence and for mesh op...
Configuration
The extension supports configuration under uaiCopilot in VS Code settings. You can also place a .vscode-extension-config.json file in the workspace root to override settings per-project.
uaiCopilot.apiUrl (default: http://127.0.0.1:8000): the base URL of the backend API.
uaiCopilot.enableHealthCheck (default: true): whether to validate connectivity when the extension starts and to run periodic health checks.
uaiCopilot.healthCheckInterval (default: 30000): interval (ms) between periodic health check pings when health checks are enabled.
uaiCopilot.timeout (default: 5000): request timeout (ms) for the health check.
uaiCopilot.retryAttempts (default: 3): number of retry attempts when connecting to the backend.
uaiCopilot.retryDelay (default: 1000): delay (ms) between retry attempts.
uaiCopilot.healthCheckPaths (default: [/health, /database/health]): array of endpoint paths to attempt during health checks.
uaiCopilot.enableDebugLogging (default: false): enable verbose logging for debugging.
uaiCopilot.notificationFailureThreshold (default: 1): consecutive failed checks before showing a disconnection notification.
uaiCopilot.healthCheckHistorySize (default: 25): maximum number of health check results retained in the diagnostics history.
UAI Chat (uaiCopilot.chat.*): Activity bar (UAI Chat) webview sends messages to uaiCopilot.chat.chatCompletionsUrl, or to {uaiCopilot.apiUrl}/v1/chat/completions when that URL is empty. Optional chat.systemPrompt is prepended as a system message. uaiCopilot.chat.disableCopilotWorkbenchChat (default true) prevents the extension from calling VS Code workbench.action.chat.changeModel — use UAI Chat for local/cluster inference instead of Copilot’s Chat UI.
Copilot Auto lock (uaiCopilot.copilotChatAutoLock.*): keeps inline completions on github.copilot.selectedCompletionModel = Auto (Vendor Default) when enabled; the poll timer is inline-only. When disableCopilotWorkbenchChat is false, the same subsystem can re-apply workbench.action.chat.changeModel per reapplyChatIntervalMs (≥ 15000; 0 disables). modelVendor must stay copilot for Copilot Auto chat when you are not using cluster LLM (auto is normalized to copilot). GitHub can still cap usage server-side — configure billing if you need higher limits.
Cluster / Copilot-chat steering (uaiCopilot.clusterLlm.*): when disableCopilotWorkbenchChat is false, useForChatAutoLock routes changeModel to clusterLlm.chatVendor / clusterLlm.chatModelId. Ignored while disableCopilotWorkbenchChat is true. Configure VS Code’s BYOK/OpenAI-compatible provider when you use Copilot Chat with BYOK (http://<host>:11089/v1 when running ../../UAI-Cluster-Intelligence/scripts/llm_service.py); see README_LLM.md. Preflight docker-compose step 0c merges github.copilot.chat.byok.ollamaEndpoint, clusterLlm.*, chat.disableCopilotWorkbenchChat, and (when UAI_LLM_GATEWAY_URL is set) chat.chatCompletionsUrl into .vscode/settings.json unless UAI_SKIP_VSCODE_SETTINGS_SYNC=1.
Cluster & Routing
The extension can load-balance and fail over across multiple cluster endpoints automatically.
As of 3.10.0, node discovery is dynamic at runtime:
the extension performs periodic LAN discovery across local private subnets,
new reachable nodes are merged into the router immediately,
metadata is collected directly from node health payloads (hostname, CPU, RAM, GPU, TF compute),
no local state file is required for node onboarding.
uaiCopilot.clusterEndpoints (default: []): list of cluster endpoint URLs to probe in parallel. When non-empty, overrides apiUrl for routing. Example: [http://127.0.0.1:8000, http://192.168.1.81:8000].
uaiCopilot.routingStrategy (default: "round-robin"): how the next endpoint is chosen. Options: round-robin (even distribution), failover (always prefer the first healthy node), fastest (lowest-latency node wins).
The status bar shows live cluster health: CPU%, RAM, compute label, and a connected/discovered node ratio (e.g. 2/9 nodes). A hardware stats item shows total CPU, GPU/CPU mode, RAM, and estimated Teraflops.
Tooltip behavior now includes:
- consistent connectivity header semantics with status bar,
- explicit LAN and Cloud online ratios,
- per-node scope tags like
[LAN] and [Cloud],
- local system indicator (
🟢/🟡) derived from local endpoint health.
Phase 288 (Android API) Configuration
For deployments using the Phase 288 Android API service, configure the extension to query the enriched health endpoints:
Settings (.vscode/settings.json or VS Code settings UI):
{
"uaiCopilot.apiUrl": "http://192.168.1.98:8080",
"uaiCopilot.healthCheckPaths": [
"/health",
"/system",
"/clusters"
],
"uaiCopilot.healthCheckInterval": 30000,
"uaiCopilot.timeout": 5000
}
The /health endpoint on phase 288 includes:
- System hostname and timestamp
- CPU usage percentage
- RAM metrics (used/total GB)
- GPU details and estimated TFLOPS (via nvidia-smi when available)
- GPU compute estimates for RTX, Tesla, and AMD models
The /system endpoint provides additional system status and metrics.
The /clusters endpoint returns live cluster node state.
This ensures the extension properly displays:
- Total Compute TF values in tooltips (e.g., "29.3TF", "2.1PF")
- Per-node GPU utilization and memory
- Accurate hostname resolution (vs. IP duplication)
Diagnostics, Feedback, and Issue Reporting
The diagnostics panel includes a feedback submission form. The extension also captures runtime warning/error context and queues issue reports for submission when the backend is reachable.
Capabilities:
- submit feedback from diagnostics or command palette,
- include/exclude diagnostics payload on feedback submission,
- capture logger warnings/errors,
- capture guarded runtime errors and process-level unhandled failures,
- redact sensitive fields (
token, password, apikey, etc.) before queuing/submission.
Commands
UAI Copilot: Open (uai-copilot.open): Opens the UAI Copilot extension.
UAI Copilot: Open UAI Chat (uai-copilot.openUaiChat): Focuses the UAI Chat sidebar (OpenAI-compatible chat completions).
UAI Copilot: Reconnect (uai-copilot.reconnect): Forces a re-check of the backend connection.
UAI Copilot: Diagnostics (uai-copilot.diagnostics): Opens a diagnostics panel with connection state and logs.
UAI Copilot: Test Connection (uai-copilot.testConnection): Performs a single connection test and shows results.
UAI Copilot: Report Issue / Send Feedback (uai-copilot.reportIssue): Submits user feedback with optional diagnostics context.
Troubleshooting
- If you see Connection Error / -1ms, verify that the backend's health endpoint is reachable:
http://127.0.0.1:8000/health
http://127.0.0.1:8000/database/health
Development
This extension is built in TypeScript and packaged with vsce.
cd modules/UAI-Extension/vscode-extension
npm install
npm run compile
npm run package
npm run publish