DeepSeek Pilot — Vision + Token Tracking

konstantyn-ganenkov

230 installs

DeepSeek V4 Pro & Flash in GitHub Copilot Chat. Vision proxy for image attachments, KV-cache-aware context-window indicator, persistent reasoning cache, live session cost and platform balance in the status bar, a thinking-effort selector per model variant, and one-click wiring as Copilot's utility/u

Installation

Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.

Copied to clipboard

More Info

DeepSeek Pilot — DeepSeek V4 in GitHub Copilot Chat

Pick DeepSeek V4 Pro or Flash from the Copilot Chat model picker — and keep everything Copilot already gives you. Agent mode, tool calling, MCP servers, custom instructions, skills: all of it keeps working, now running on DeepSeek, with your own API key.

DeepSeek's V4 models are dramatically cheaper than the flagships from OpenAI and Anthropic while staying competitive on coding work — and their prefix-cache pricing means a long, stable chat gets cheaper and faster as it grows. Copilot Chat ships generic BYOK support; DeepSeek Pilot adds the DeepSeek-specific layer on top: live session cost and platform balance in the status bar, KV-cache-aware compaction guidance, a per-model thinking-effort control, vision attachments routed through a describer model, and one-click wiring as Copilot's hidden utility model.

Requires VS Code 1.120+ and the GitHub Copilot Chat extension.

Model rename heads-up: DeepSeek is retiring the legacy deepseek-chat and deepseek-reasoner model names on 2026-07-24; until then they quietly resolve to V4 Flash. DeepSeek Pilot already targets the current deepseek-v4-pro / deepseek-v4-flash models, so it keeps working through the switch — and lets you pick Pro, not just Flash.

Why DeepSeek Pilot

Don't replace Copilot — power it up. No new sidebar, no second chat UI to learn. Just a new model in the picker you already use, so agent mode, tool calling, MCP, .instructions.md, AGENTS.md, and skills all keep working — now on DeepSeek.
See what you're spending. Real per-session cost in your account currency (USD/CNY auto-detected), live DeepSeek platform balance, cache-hit rate, and token counts — right in the status bar, not buried in a log.
Spend the cache, not your money. DeepSeek bills a long, stable chat at the ~90%-cheaper cache-hit rate. The status bar tells you exactly when compacting would cost you more than it saves — guidance a generic BYOK widget can't give.
BYOK, pay DeepSeek directly. Your key, your bill, your rate limits — stored in the OS keychain, never in settings.json or your Git history.

What you get

Everything Copilot does, on DeepSeek — because this registers as a native Copilot model provider, agent mode, tool calling, MCP servers, custom instructions, and skills all work unchanged. Switch models mid-chat without losing history.
Four model variants in the picker — Pro and Flash, each with thinking / non-thinking modes, grouped under one DeepSeek V4 row. Thinking variants expose a per-model Thinking Effort (high / max) control.
Drop images into chat with a text-only model — a vision-capable describer model summarises each attachment so DeepSeek can reason over the content. Descriptions are cached by image hash, so the same screenshot never re-bills.
Live context-window indicator, DeepSeek-aware — status-bar item showing % of window used · cache-hit % with KV-cache-aware guidance ("keep going, your cache is healthy" vs. "compact now"). Configurable thresholds.
Real session cost & platform balance — token counts, running spend in your account currency (USD / CNY auto-detected from the DeepSeek API), and a one-click balance refresh. Pricing matches DeepSeek's permanent V4-Pro 75%-off + 1/10 cache-hit rates (announced 2026-05-22).
Persistent reasoning cache — reasoning traces from thinking variants are fingerprinted, persisted across VS Code restarts, and replayed during multi-turn agent loops so DeepSeek's KV cache stays warm.
Wire as Copilot's utility + utility-small models — two one-click commands route Copilot's background flows (chat titles, summaries, commit messages, intent detection — plus the lightweight fast-path slot Copilot Chat 1.121 exposes) through DeepSeek Flash, where the dollar cost is negligible.
Production-grade request pipeline — schema sanitisation, tool-call / tool-result pairing, mid-stream truncation detection, retry on transient failures, automatic thinking-off for Copilot's background utility requests (titles, commit messages) so reasoning tokens aren't spent on them, and debug-only cache-trace snapshots for diagnosing odd 400s without leaking message content.
Friendly setup — variants stay visible in the picker before an API key is configured (with a warning icon), and key validation probes the configured endpoint before saving, with a fall-through path for proxy tokens that can't be validated upstream.

Compared to the alternatives

	DeepSeek Pilot	Local proxy (e.g. LiteLLM)	Standalone DeepSeek extension
Works inside Copilot Chat	Yes	Yes	No — separate UI
Copilot agent mode, tools, MCP, skills	Yes	Yes	Reimplemented, partial
Vision / image attachments	Yes — proxied	No	No
Live cost, balance & cache-hit in the status bar	Yes	No	Varies
KV-cache-aware compaction guidance	Yes	No	No
Per-session spend in your account currency	Yes	No	No
No extra process to run	Yes	No	Yes
API key in the OS keychain	Yes	No	Varies
One-click install	Yes	No	Yes

Install

From the Marketplace (recommended once published):

ext install konstantyn-ganenkov.deepseek-pilot

Or open the Extensions view (Ctrl+Shift+X) and search for DeepSeek Pilot.

From a local build:

npm install
npm run package

Then in VS Code: Extensions → ... → Install from VSIX... → pick the newest dist/deepseek-pilot-<version>.vsix.

Or link directly so a npm run watch keeps the installed copy in sync:

# Symlink from your VS Code extensions folder (Windows; mklink requires an admin shell or developer mode)
mklink /D %USERPROFILE%\.vscode\extensions\konstantyn-ganenkov.deepseek-pilot-<version> C:\path\to\deepseek-pilot

Commands

Command	Palette
Manage Provider	`DeepSeek Pilot: Manage Provider`
Set API Key	`DeepSeek Pilot: Set API Key`
Clear API Key	`DeepSeek Pilot: Clear API Key`
Set Vision Proxy Model	`DeepSeek Pilot: Set Vision Proxy Model`
Refresh Balance	`DeepSeek Pilot: Refresh Balance`
Clear Session Counter	`DeepSeek Pilot: Clear Session Counter`
Show Context Window Details	`DeepSeek Pilot: Show Context Window Details`
Show Cache Stats	`DeepSeek Pilot: Show Reasoning Cache Stats`
Clear Reasoning Cache	`DeepSeek Pilot: Clear Reasoning Cache`
Use as Copilot Utility Model	`DeepSeek Pilot: Use as Copilot Utility Model`
Use as Copilot Utility Small Model	`DeepSeek Pilot: Use as Copilot Utility Small Model`
Show Logs	`DeepSeek Pilot: Show Logs`

Model Picker

All four variants remain visible in the Copilot Chat model picker.

Thinking variants expose a per-model Thinking Effort control with high and max.
If the provider is visible but not fully configured yet, use Manage Provider from the picker or command palette.

Status Bar

One combined right-aligned item:

$(sparkle) DeepSeek Pilot · 16% ctx · $0.15  $17.07

Leading icon signals context-window state: sparkle (healthy) → history (warn) → warning (critical), with the background colour shifting to yellow/red at the configured thresholds.
16% ctx is the most recent turn's prompt size as a fraction of the model's input window — the actionable "should I compact?" signal.
$0.15 is the running session cost.
$17.07 is the platform balance after a refresh.
Click opens the Manage Provider quick pick (set key, refresh balance, context details, cache stats, logs, settings).
Hover shows the full breakdown: model, cache hit %, last turn details, situation-specific compaction advice, session totals, balance, reasoning effort, and the KV-cache primer.

When to compact your chat (DeepSeek-specific)

VS Code's built-in chat-view context-window widget now reads real BYOK usage (as of VS Code 1.120 — fixing the long-standing microsoft/vscode#313458). This extension feeds it via LanguageModelDataPart.json(usage, "usage") alongside DeepSeek's usage chunk — the bundled Copilot Chat BYOK consumer matches on the literal MIME "usage" and expects OpenAI-shape prompt_tokens / completion_tokens / total_tokens (which DeepSeek returns natively). The extension's own status-bar widget stays — it surfaces the DeepSeek-specific signal the built-in widget can't (cache-hit %) and the cache-aware compaction advice.

The reason it matters: DeepSeek caches by prefix. Every request that shares its leading tokens with a recent request gets those tokens served from disk cache, billed at ~10% of the normal price and skipping the prefill step entirely. A long, stable chat accumulates a high cache-hit rate — the conversation gets cheaper and faster as it grows.

Compaction (summarising the chat into a shorter system message) rewrites the prefix, which invalidates the cache. The next 1–3 turns then have to rebuild it: every prompt token is a cache miss, first-token latency spikes by seconds, and the cost-per-turn jumps by an order of magnitude.

The rules of thumb the widget encodes:

Window used	Cache hit	Recommendation
< 60%	high	Keep going. Compacting now would force the next several turns into full prefill — slower and more expensive than just letting the cache work.
60–80%	any	Consider wrapping up the topic. Compact only if the conversation will continue for many more turns.
> 80%	any	Compact or start a new chat now. Truncation is imminent and the KV-cache penalty is worth it at this saturation.
any	< 30% & growing prompt	Something is invalidating the prefix — typically editing earlier messages, switching models mid-chat, or a randomised system context. Should self-recover within a few turns.

Adjust thresholds via deepseek-pilot.contextWarnThreshold and deepseek-pilot.contextCriticalThreshold.

Configuration

deepseek-pilot.reasoningEffort: default effort for (thinking) variants
deepseek-pilot.optimizeUtilityRequests: auto-disable thinking on Copilot's background utility requests (titles, commit messages) when a thinking variant is active; default on
deepseek-pilot.modelIdOverrides: remap API model IDs for DeepSeek-compatible proxy endpoints
deepseek-pilot.baseUrl: switch between DeepSeek and compatible gateways
deepseek-pilot.contextWarnThreshold / deepseek-pilot.contextCriticalThreshold: percent thresholds for the context-window indicator
deepseek-pilot.debug: emit verbose diagnostics to the DeepSeek Pilot output channel

Prior art

This extension started life by surveying two earlier MIT-licensed DeepSeek-in-Copilot projects — Vizards/deepseek-v4-for-copilot (vision proxy concept) and Laurent00TT/deepseek-v4-vscode-chat (balance + spend tracking idea). The current codebase has since been substantially rewritten end-to-end: a separate context-window tracker, persistent reasoning cache, hardened request/sanitisation pipeline, vision-description caching, currency-aware billing, KV-cache-aware compaction guidance, and the model variant set are all original to this project. Thanks to both upstreams for the starting direction.

Support the project

Free and MIT-licensed. If it helps:

Star the repo on GitHub
Sponsor on GitHub
File an issue — include the DeepSeek Pilot: Show Logs output if you hit something odd

License

MIT — see the LICENSE file.