Copilot Token Awareness
Stop being surprised by your AI credit bill.
This extension gives you a live, transparent estimate of how many tokens GitHub Copilot will consume - and what that costs in AI credits - before you send a single message. Use it to understand where your context budget goes, choose the right model, and get more out of every credit.
Screenshot

The Breakdown panel shows Ask/Agent mode, every context source contributing tokens, a step-by-step cost calculation, and a link to official GitHub pricing.
Why This Exists
GitHub Copilot Chat and Agent mode bill in AI credits (1 credit = $0.01 USD) Source. Each request sends a large bundle of context to the model - your active file, snippets from open tabs, a system prompt, custom instructions, and workspace-index retrievals - all before you type a single word of your question.
Most developers have no visibility into this until they see a usage report. This extension surfaces those numbers in real time so you can:
- Understand what is driving your token count
- Choose the most cost-effective model for the task
- Optimise by closing unused tabs or splitting large files
- Budget AI credit usage across a team or project
Important: Code completions and next-edit suggestions are not billed in AI credits and are excluded from all estimates in this extension. Only Ask mode and Agent mode chat interactions are in scope.
Source: GitHub Copilot billing - Code completions
Features
| Feature |
Details |
| Live status bar |
🔢 [Ask] ~23,129 tokens \| ~$0.0694 - updates as you type, edit files, or switch tabs |
| Ask / Agent mode |
Toggle between modes in the status bar or Breakdown panel; each mode uses different estimation assumptions |
| Breakdown panel |
Click the status bar to see a per-source token table, step-by-step cost calculation, and all assumptions used |
| Transparent assumptions |
Every value used in the estimate is shown with an explanation - nothing is hidden |
| 23 built-in models |
All current Copilot models with exact pricing from the official docs |
| Included-model detection |
GPT-4.1 and GPT-5 mini are flagged as included models that don't consume credits within your plan allowance |
| Custom instructions detection |
.github/copilot-instructions.md and *.instructions.md files are auto-detected and counted |
| Optimisation tips |
Rule-based hints: too many tabs open, active file too large, approaching context window limit |
| User-patchable pricing |
Override any model's multipliers or add new models without waiting for an extension update |
How the Estimate Is Built
Copilot does not expose what it sends to the model. This extension reconstructs a realistic estimate using Copilot's published architecture and empirical measurement. Every assumption is shown in the Estimation Assumptions section of the Breakdown panel.
Context sources (in order)
| Source |
How it is estimated |
Notes |
| System prompt |
Fixed budget per mode (Ask: ~5,000 tokens, Agent: ~14,000 tokens) |
Ask mode contains behaviour rules only. Agent mode adds full tool definitions, skill descriptions, and security policies. Override with askSystemPromptTokenBudget or agentSystemPromptTokenBudget. |
| Custom instructions |
Actual file content tokenised |
Copilot auto-injects .github/copilot-instructions.md, .copilot-instructions.md, and any *.instructions.md file into every request. |
| Active file |
Full file content tokenised |
Copilot always includes the entire active file in Ask and Agent requests. |
| Selected text |
Full selection content tokenised |
Included when you have a selection active in the editor. |
| Open tabs |
20% of each tab's tokens (Ask) / 15% (Agent) |
Copilot uses a Jaccard-similarity algorithm to extract best-match snippets (~60 lines each), not the full file. The snippet ratio approximates this. Override with tabSnippetRatio. |
| Workspace index retrieval |
Fixed overhead: 1,500 tokens (Ask) / 3,000 tokens (Agent) |
Copilot's semantic index may pull relevant files that are not currently open as tabs. Override with retrievalOverheadTokens. |
What is NOT included (by design)
| Item |
Why excluded |
| Your prompt text |
Typically 10–200 tokens; add it mentally for a tighter estimate |
| Conversation history |
Only Turn 1 is estimatable before the chat starts; each subsequent turn adds ~500–2,000 tokens |
| Code completions |
Not billed in AI credits - separate unlimited quota |
| Cached tokens |
Repeat turns benefit from Copilot's prompt cache (10× cheaper); the Turn-1 disclaimer covers this |
Ask mode vs Agent mode
|
Ask mode |
Agent mode |
| System prompt |
~5,000 tokens |
~14,000 tokens |
| Tab snippet ratio |
20% of file tokens |
15% of file tokens |
| Retrieval overhead |
~1,500 tokens |
~3,000 tokens |
| Estimate variance |
±10% |
±15% |
| Use case |
Single-turn Q&A, explain, review |
Multi-step tasks, file edits, terminal |
Switch modes using the Ask / Agent toggle buttons at the top of the Breakdown panel, or via the copilotTokenAwareness.chatMode setting.
Cost Calculation
Input cost = total_tokens × input_multiplier × token_unit_price_usd
Where:
total_tokens = sum of all context sources above
input_multiplier = model-specific value from the GitHub Copilot pricing table ($/M tokens ÷ 10)
token_unit_price_usd = 0.00001 (i.e. 1,000 token units = $0.01 = 1 AI credit)
The Breakdown panel also shows a worst-case output ceiling (expandable) that assumes Copilot fills the entire remaining context window - this almost never happens but shows the absolute maximum exposure.
Supported Models
All pricing is sourced directly from the official GitHub Copilot models and pricing page.
Anthropic
| Model |
Input $/M |
Output $/M |
Context |
| Claude Haiku 4.5 |
$1.00 |
$5.00 |
160K |
| Claude Sonnet 4 |
$3.00 |
$15.00 |
160K |
| Claude Sonnet 4.5 |
$3.00 |
$15.00 |
160K |
| Claude Sonnet 4.6 |
$3.00 |
$15.00 |
160K |
| Claude Opus 4.5 |
$5.00 |
$25.00 |
234K |
| Claude Opus 4.6 |
$5.00 |
$25.00 |
234K |
| Claude Opus 4.7 |
$5.00 |
$25.00 |
234K |
| Claude Opus 4.8 |
$5.00 |
$25.00 |
232K |
Anthropic models include a cache-write cost in addition to cached-input pricing. The extension uses the non-cached rate (accurate for Turn 1).
Google
| Model |
Input $/M |
Output $/M |
Context |
| Gemini 2.5 Pro |
$1.25 |
$10.00 |
173K |
| Gemini 3 Flash (Preview) |
$0.50 |
$3.00 |
173K |
| Gemini 3.1 Pro (Preview) |
$2.00 |
$12.00 |
173K |
| Gemini 3.5 Flash |
$1.50 |
$9.00 |
192K |
OpenAI
| Model |
Input $/M |
Output $/M |
Context |
Notes |
| GPT-4.1 |
$2.00 |
$8.00 |
128K |
⭐ Included model |
| GPT-5 mini |
$0.25 |
$2.00 |
192K |
⭐ Included model |
| GPT-5.2 |
$1.75 |
$14.00 |
192K |
|
| GPT-5.2-Codex |
$1.75 |
$14.00 |
400K |
|
| GPT-5.3-Codex |
$1.75 |
$14.00 |
400K |
|
| GPT-5.4 |
$2.50 |
$15.00 |
400K |
|
| GPT-5.4 mini |
$0.75 |
$4.50 |
400K |
|
| GPT-5.4 nano |
$0.20 |
$1.25 |
400K |
|
| GPT-5.5 |
$5.00 |
$30.00 |
400K |
|
⭐ Included models (GPT-4.1, GPT-5 mini) do not consume AI credits within your plan's monthly allowance. The extension flags these with a green banner and notes that the cost shown is the overage rate only.
Fine-tuned (GitHub) & Microsoft
| Model |
Input $/M |
Output $/M |
Context |
Notes |
| Raptor mini (Preview) |
$0.25 |
$2.00 |
264K |
Uses GPT-5 mini pricing |
| MAI-Code-1-Flash |
$0.75 |
$4.50 |
128K |
Microsoft |
Settings Reference
| Setting |
Default |
Description |
copilotTokenAwareness.chatMode |
ask |
Chat mode to estimate for: ask or agent. Controls system-prompt budget, snippet ratio, and retrieval overhead. |
copilotTokenAwareness.model |
claude-sonnet-4-6 |
Model used for cost calculation. Selectable from all 23 built-in models. |
copilotTokenAwareness.askSystemPromptTokenBudget |
5000 |
Override system-prompt token estimate for Ask mode only. 5000 = use mode default (~5,000 tokens). |
copilotTokenAwareness.agentSystemPromptTokenBudget |
14000 |
Override system-prompt token estimate for Agent mode only. 14000 = use mode default (~14,000 tokens). |
copilotTokenAwareness.systemPromptTokenBudget |
0 |
(Deprecated) Legacy single-value override for both modes. Ignored when a mode-specific setting is set. |
copilotTokenAwareness.tabSnippetRatio |
0 |
Fraction of each open tab's tokens included as snippets (0.0–1.0). 0 = use mode default (Ask: 0.20, Agent: 0.15). |
copilotTokenAwareness.retrievalOverheadTokens |
0 |
Fixed token budget for workspace-index retrieval. 0 = use mode default (Ask: 1,500, Agent: 3,000). |
copilotTokenAwareness.includeOpenTabs |
true |
Include open editor tabs in the token estimate. |
copilotTokenAwareness.maxTabsToInclude |
5 |
Maximum number of open tabs to include. |
copilotTokenAwareness.tokenUnitPriceUsd |
0.00001 |
Price per token unit in USD. Update if GitHub changes pricing. |
copilotTokenAwareness.modelOverrides |
{} |
Override multipliers or context window of any built-in model. |
copilotTokenAwareness.customModels |
[] |
Add models not yet built into the extension. |
Keeping pricing current
GitHub may update multipliers or add new models at any time. You don't need to wait for an extension update.
Override a built-in model's multiplier:
// settings.json
"copilotTokenAwareness.modelOverrides": {
"claude-sonnet-4-6": { "inputMultiplier": 0.35 }
}
Add a brand-new model:
"copilotTokenAwareness.customModels": [
{
"id": "my-new-model",
"displayName": "My New Model",
"inputMultiplier": 0.2,
"outputMultiplier": 1.0,
"contextWindow": 200000
}
]
Update the base token unit price:
"copilotTokenAwareness.tokenUnitPriceUsd": 0.000012
Commands
| Command |
Description |
Copilot Token Awareness: Show Breakdown |
Open the Breakdown panel |
Copilot Token Awareness: Reset Session Totals |
Clear the session-level cumulative counter |
Token Counting
Uses tiktoken (cl100k_base encoding) running entirely in WebAssembly inside VS Code - no data is sent to any external service. Falls back to a character-based heuristic (~4 chars/token) if the WASM module fails to load; the status bar shows a warning in that case.
Tokenizer note: cl100k_base is OpenAI's tokenizer. Anthropic (Claude) and Google (Gemini) use their own tokenizers. For typical English and source code the counts are within ±5%, which is within the stated estimate variance. The variance percentages shown in the UI (±10% Ask, ±15% Agent) account for this approximation.
Accuracy & Disclaimer
This extension is an awareness tool, not a billing meter. Estimates are based on:
- Copilot's published architecture and empirically observed behaviour
- Turn 1 only (conversation history is not pre-knowable)
- Non-cached token rates (cached turns are cheaper; the disclaimer in the panel notes this)
- The assumption that Copilot uses the entire active file and snippets from tabs (actual selection may vary by feature and version)
Expected accuracy: ±10% for Ask mode, ±15% for Agent mode.
For official billing information, plan allowances, and current per-token rates, always refer to:
📄 GitHub Copilot - Models and Pricing
License
MIT © Raj Uppadhyay