Skip to content
| Marketplace
Sign in
Visual Studio Code>Programming Languages>DeepSeek Pilot — Vision + Token TrackingNew to Visual Studio Code? Get it now.
DeepSeek Pilot — Vision + Token Tracking

DeepSeek Pilot — Vision + Token Tracking

konstantyn-ganenkov

|
2 installs
| (0) | Free
| Sponsor
DeepSeek V4 Pro & Flash in GitHub Copilot Chat. Vision proxy for image attachments, KV-cache-aware context-window indicator, persistent reasoning cache, live session cost and platform balance in the status bar, a thinking-effort selector per model variant, and one-click wiring as Copilot's utility/u
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

DeepSeek Pilot — VS Code Extension

DeepSeek V4 Pro & Flash models in GitHub Copilot Chat, tuned for long agentic sessions. Built for VS Code 1.120+ (languageModelChatProviders, LanguageModelDataPart, BYOK context-window widget, chat.utilityModel).

  • Four model variants — Pro / Flash × thinking / non-thinking, with a per-variant Thinking Effort (high / max) control in the model picker, grouped under one "DeepSeek V4" row
  • Native Copilot Chat integration — toolCalling: 128, imageInput, per-model detail (pricing hint) and tooltip (context capacity), themed statusIcon for thinking variants, and a LanguageModelDataPart.json(usage) sidecar so VS Code 1.120's built-in chat-view context widget reflects real DeepSeek usage instead of zeros
  • Wire as Copilot's utility model — one-click commands to route Copilot's background flows (titles, summaries, commit messages, intent detection) through DeepSeek Flash via chat.utilityModel / chat.utilitySmallModel
  • Vision proxy — drop images into chat; a vision-capable model describes them so text-only DeepSeek can reason over the content. Descriptions are cached by image hash so the same screenshot doesn't re-cost on every turn
  • Context-window indicator (DeepSeek-aware) — live "% of window used · cache hit %" status bar item with KV-cache-aware compaction guidance. Sits alongside VS Code's chat-view widget and surfaces the DeepSeek-specific advice the host widget can't (cache-hit-rate, "don't compact early" guidance, configurable thresholds)
  • Session cost & balance — token counts, estimated spend in your account currency (USD/CNY auto-detect), and on-demand platform balance refresh. 75% Pro promo discount is opt-in and auto-expires
  • Persistent reasoning cache — reasoning_content from thinking variants is fingerprinted, persisted across VS Code restarts, and replayed on multi-turn agent loops to preserve DeepSeek's KV cache. Includes a one-shot "Clear Reasoning Cache" command for diagnostics
  • Robust request pipeline — schema sanitization, tool-call/tool-result pairing validation, mid-stream truncation detection, retry on transient failures, and debug-only cache-trace snapshots for diagnosing 400s without leaking message content
  • Model discoverability — variants stay visible in the picker before an API key is configured (with a warning status icon); setting the key surfaces them automatically
  • API key validation — probes the configured endpoint before saving, with a fall-through path for proxy tokens that can't be validated upstream

Install

From the Marketplace (recommended once published):

ext install konstantyn-ganenkov.deepseek-pilot

Or open the Extensions view (Ctrl+Shift+X) and search for DeepSeek Pilot.

From a local build:

npm install
npm run package

Then in VS Code: Extensions → ... → Install from VSIX... → pick the newest dist/deepseek-pilot-<version>.vsix.

Or link directly so a npm run watch keeps the installed copy in sync:

# Symlink from your VS Code extensions folder (Windows; mklink requires an admin shell or developer mode)
mklink /D %USERPROFILE%\.vscode\extensions\konstantyn-ganenkov.deepseek-pilot-<version> C:\path\to\deepseek-pilot

Commands

Command Palette
Manage Provider DeepSeek Pilot: Manage Provider
Set API Key DeepSeek Pilot: Set API Key
Clear API Key DeepSeek Pilot: Clear API Key
Set Vision Proxy Model DeepSeek Pilot: Set Vision Proxy Model
Refresh Balance DeepSeek Pilot: Refresh Balance
Clear Session Counter DeepSeek Pilot: Clear Session Counter
Show Context Window Details DeepSeek Pilot: Show Context Window Details
Show Cache Stats DeepSeek Pilot: Show Reasoning Cache Stats
Clear Reasoning Cache DeepSeek Pilot: Clear Reasoning Cache
Use as Copilot Utility Model DeepSeek Pilot: Use as Copilot Utility Model
Use as Copilot Utility Small Model DeepSeek Pilot: Use as Copilot Utility Small Model
Show Logs DeepSeek Pilot: Show Logs

Model Picker

All four variants remain visible in the Copilot Chat model picker.

  • Thinking variants expose a per-model Thinking Effort control with high and max.
  • If the provider is visible but not fully configured yet, use Manage Provider from the picker or command palette.

Status Bar

One combined right-aligned item:

$(sparkle) DeepSeek Pilot · 16% ctx · $0.15  $17.07
  • Leading icon signals context-window state: sparkle (healthy) → history (warn) → warning (critical), with the background colour shifting to yellow/red at the configured thresholds.
  • 16% ctx is the most recent turn's prompt size as a fraction of the model's input window — the actionable "should I compact?" signal.
  • $0.15 is the running session cost.
  • $17.07 is the platform balance after a refresh.
  • Click opens the Manage Provider quick pick (set key, refresh balance, context details, cache stats, logs, settings).
  • Hover shows the full breakdown: model, cache hit %, last turn details, situation-specific compaction advice, session totals, balance, reasoning effort, and the KV-cache primer.

When to compact your chat (DeepSeek-specific)

VS Code's built-in chat-view context-window widget now reads real BYOK usage (as of VS Code 1.120 — fixing the long-standing microsoft/vscode#313458). This extension feeds it via LanguageModelDataPart.json(usage, "usage") alongside DeepSeek's usage chunk — the bundled Copilot Chat BYOK consumer matches on the literal MIME "usage" and expects OpenAI-shape prompt_tokens / completion_tokens / total_tokens (which DeepSeek returns natively). The extension's own status-bar widget stays — it surfaces the DeepSeek-specific signal the built-in widget can't (cache-hit %) and the cache-aware compaction advice.

The reason it matters: DeepSeek caches by prefix. Every request that shares its leading tokens with a recent request gets those tokens served from disk cache, billed at ~10% of the normal price and skipping the prefill step entirely. A long, stable chat accumulates a high cache-hit rate — the conversation gets cheaper and faster as it grows.

Compaction (summarising the chat into a shorter system message) rewrites the prefix, which invalidates the cache. The next 1–3 turns then have to rebuild it: every prompt token is a cache miss, first-token latency spikes by seconds, and the cost-per-turn jumps by an order of magnitude.

The rules of thumb the widget encodes:

Window used Cache hit Recommendation
< 60% high Keep going. Compacting now would force the next several turns into full prefill — slower and more expensive than just letting the cache work.
60–80% any Consider wrapping up the topic. Compact only if the conversation will continue for many more turns.
> 80% any Compact or start a new chat now. Truncation is imminent and the KV-cache penalty is worth it at this saturation.
any < 30% & growing prompt Something is invalidating the prefix — typically editing earlier messages, switching models mid-chat, or a randomised system context. Should self-recover within a few turns.

Adjust thresholds via deepseek-pilot.contextWarnThreshold and deepseek-pilot.contextCriticalThreshold.

Configuration

  • deepseek-pilot.reasoningEffort: default effort for (thinking) variants
  • deepseek-pilot.modelIdOverrides: remap API model IDs for DeepSeek-compatible proxy endpoints
  • deepseek-pilot.baseUrl: switch between DeepSeek and compatible gateways
  • deepseek-pilot.contextWarnThreshold / deepseek-pilot.contextCriticalThreshold: percent thresholds for the context-window indicator
  • deepseek-pilot.applyProDiscount: apply the 75% Pro promo to cost estimates (off by default; auto-expires 2026-05-31)
  • deepseek-pilot.debug: emit verbose diagnostics to the DeepSeek Pilot output channel

Prior art

This extension started life by surveying two earlier MIT-licensed DeepSeek-in-Copilot projects — Vizards/deepseek-v4-for-copilot (vision proxy concept) and Laurent00TT/deepseek-v4-vscode-chat (balance + spend tracking idea). The current codebase has since been substantially rewritten end-to-end: a separate context-window tracker, persistent reasoning cache, hardened request/sanitisation pipeline, vision-description caching, currency-aware billing, KV-cache-aware compaction guidance, and the model variant set are all original to this project. Thanks to both upstreams for the starting direction.

Support the project

Free and MIT-licensed. If it helps:

  • ★ Star the repo on GitHub
  • 💖 Sponsor on GitHub
  • 🐞 File issues — include the DeepSeek Pilot: Show Logs output if you hit something odd

License

MIT — see the LICENSE file.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft