Copilot Token Awareness

Stop being surprised by your AI credit bill.
This extension gives you a live, transparent estimate of how many tokens GitHub Copilot will consume - and what that costs in AI credits - before you send a single message. Use it to understand where your context budget goes, choose the right model, and get more out of every credit.

Screenshot

Copilot Token Breakdown panel showing Ask/Agent mode switcher, context sources table, step-by-step cost calculation, and official pricing link

The Breakdown panel shows Ask/Agent mode, every context source contributing tokens, a step-by-step cost calculation, and a link to official GitHub pricing.

Why This Exists

GitHub Copilot Chat and Agent mode bill in AI credits (1 credit = $0.01 USD) Source. Each request sends a large bundle of context to the model - your active file, snippets from open tabs, a system prompt, custom instructions, and workspace-index retrievals - all before you type a single word of your question.

Most developers have no visibility into this until they see a usage report. This extension surfaces those numbers in real time so you can:

Understand what is driving your token count
Choose the most cost-effective model for the task
Optimise by closing unused tabs or splitting large files
Budget AI credit usage across a team or project

Important: Code completions and next-edit suggestions are not billed in AI credits and are excluded from all estimates in this extension. Only Ask mode and Agent mode chat interactions are in scope.
Source: GitHub Copilot billing - Code completions

Features

Feature	Details
Live status bar	`🔢 [Ask] ~23,129 tokens \\| ~$0.0694` - updates as you type, edit files, or switch tabs
Ask / Agent mode	Toggle between modes in the status bar or Breakdown panel; each mode uses different estimation assumptions
Breakdown panel	Click the status bar to see a per-source token table, step-by-step cost calculation, and all assumptions used
Transparent assumptions	Every value used in the estimate is shown with an explanation - nothing is hidden
23 built-in models	All current Copilot models with exact pricing from the official docs
Included-model detection	GPT-4.1 and GPT-5 mini are flagged as included models that don't consume credits within your plan allowance
Custom instructions detection	`.github/copilot-instructions.md` and `*.instructions.md` files are auto-detected and counted
Optimisation tips	Rule-based hints: too many tabs open, active file too large, approaching context window limit
User-patchable pricing	Override any model's multipliers or add new models without waiting for an extension update

How the Estimate Is Built

Copilot does not expose what it sends to the model. This extension reconstructs a realistic estimate using Copilot's published architecture and empirical measurement. Every assumption is shown in the Estimation Assumptions section of the Breakdown panel.

Context sources (in order)

Source	How it is estimated	Notes
System prompt	Fixed budget per mode (Ask: ~5,000 tokens, Agent: ~14,000 tokens)	Ask mode contains behaviour rules only. Agent mode adds full tool definitions, skill descriptions, and security policies. Override with `askSystemPromptTokenBudget` or `agentSystemPromptTokenBudget`.
Custom instructions	Actual file content tokenised	Copilot auto-injects `.github/copilot-instructions.md`, `.copilot-instructions.md`, and any `*.instructions.md` file into every request.
Active file	Full file content tokenised	Copilot always includes the entire active file in Ask and Agent requests.
Selected text	Full selection content tokenised	Included when you have a selection active in the editor.
Open tabs	20% of each tab's tokens (Ask) / 15% (Agent)	Copilot uses a Jaccard-similarity algorithm to extract best-match snippets (~60 lines each), not the full file. The snippet ratio approximates this. Override with `tabSnippetRatio`.
Workspace index retrieval	Fixed overhead: 1,500 tokens (Ask) / 3,000 tokens (Agent)	Copilot's semantic index may pull relevant files that are not currently open as tabs. Override with `retrievalOverheadTokens`.

What is NOT included (by design)

Item	Why excluded
Your prompt text	Typically 10–200 tokens; add it mentally for a tighter estimate
Conversation history	Only Turn 1 is estimatable before the chat starts; each subsequent turn adds ~500–2,000 tokens
Code completions	Not billed in AI credits - separate unlimited quota
Cached tokens	Repeat turns benefit from Copilot's prompt cache (10× cheaper); the Turn-1 disclaimer covers this

Ask mode vs Agent mode

	Ask mode	Agent mode
System prompt	~5,000 tokens	~14,000 tokens
Tab snippet ratio	20% of file tokens	15% of file tokens
Retrieval overhead	~1,500 tokens	~3,000 tokens
Estimate variance	±10%	±15%
Use case	Single-turn Q&A, explain, review	Multi-step tasks, file edits, terminal

Switch modes using the Ask / Agent toggle buttons at the top of the Breakdown panel, or via the copilotTokenAwareness.chatMode setting.

Cost Calculation

Input cost = total_tokens × input_multiplier × token_unit_price_usd

Where:

total_tokens = sum of all context sources above
input_multiplier = model-specific value from the GitHub Copilot pricing table ($/M tokens ÷ 10)
token_unit_price_usd = 0.00001 (i.e. 1,000 token units = $0.01 = 1 AI credit)

The Breakdown panel also shows a worst-case output ceiling (expandable) that assumes Copilot fills the entire remaining context window - this almost never happens but shows the absolute maximum exposure.

Supported Models

All pricing is sourced directly from the official GitHub Copilot models and pricing page.

Anthropic

Model	Input $/M	Output $/M	Context
Claude Haiku 4.5	$1.00	$5.00	160K
Claude Sonnet 4	$3.00	$15.00	160K
Claude Sonnet 4.5	$3.00	$15.00	160K
Claude Sonnet 4.6	$3.00	$15.00	160K
Claude Opus 4.5	$5.00	$25.00	234K
Claude Opus 4.6	$5.00	$25.00	234K
Claude Opus 4.7	$5.00	$25.00	234K
Claude Opus 4.8	$5.00	$25.00	232K

Anthropic models include a cache-write cost in addition to cached-input pricing. The extension uses the non-cached rate (accurate for Turn 1).

Google

Model	Input $/M	Output $/M	Context
Gemini 2.5 Pro	$1.25	$10.00	173K
Gemini 3 Flash (Preview)	$0.50	$3.00	173K
Gemini 3.1 Pro (Preview)	$2.00	$12.00	173K
Gemini 3.5 Flash	$1.50	$9.00	192K

OpenAI

Model	Input $/M	Output $/M	Context	Notes
GPT-4.1	$2.00	$8.00	128K	⭐ Included model
GPT-5 mini	$0.25	$2.00	192K	⭐ Included model
GPT-5.2	$1.75	$14.00	192K
GPT-5.2-Codex	$1.75	$14.00	400K
GPT-5.3-Codex	$1.75	$14.00	400K
GPT-5.4	$2.50	$15.00	400K
GPT-5.4 mini	$0.75	$4.50	400K
GPT-5.4 nano	$0.20	$1.25	400K
GPT-5.5	$5.00	$30.00	400K

⭐ Included models (GPT-4.1, GPT-5 mini) do not consume AI credits within your plan's monthly allowance. The extension flags these with a green banner and notes that the cost shown is the overage rate only.

Fine-tuned (GitHub) & Microsoft

Model	Input $/M	Output $/M	Context	Notes
Raptor mini (Preview)	$0.25	$2.00	264K	Uses GPT-5 mini pricing
MAI-Code-1-Flash	$0.75	$4.50	128K	Microsoft

Settings Reference

Setting	Default	Description
`copilotTokenAwareness.chatMode`	`ask`	Chat mode to estimate for: `ask` or `agent`. Controls system-prompt budget, snippet ratio, and retrieval overhead.
`copilotTokenAwareness.model`	`claude-sonnet-4-6`	Model used for cost calculation. Selectable from all 23 built-in models.
`copilotTokenAwareness.askSystemPromptTokenBudget`	`5000`	Override system-prompt token estimate for Ask mode only. `5000` = use mode default (~5,000 tokens).
`copilotTokenAwareness.agentSystemPromptTokenBudget`	`14000`	Override system-prompt token estimate for Agent mode only. `14000` = use mode default (~14,000 tokens).
`copilotTokenAwareness.systemPromptTokenBudget`	`0`	(Deprecated) Legacy single-value override for both modes. Ignored when a mode-specific setting is set.
`copilotTokenAwareness.tabSnippetRatio`	`0`	Fraction of each open tab's tokens included as snippets (0.0–1.0). `0` = use mode default (Ask: 0.20, Agent: 0.15).
`copilotTokenAwareness.retrievalOverheadTokens`	`0`	Fixed token budget for workspace-index retrieval. `0` = use mode default (Ask: 1,500, Agent: 3,000).
`copilotTokenAwareness.includeOpenTabs`	`true`	Include open editor tabs in the token estimate.
`copilotTokenAwareness.maxTabsToInclude`	`5`	Maximum number of open tabs to include.
`copilotTokenAwareness.tokenUnitPriceUsd`	`0.00001`	Price per token unit in USD. Update if GitHub changes pricing.
`copilotTokenAwareness.modelOverrides`	`{}`	Override multipliers or context window of any built-in model.
`copilotTokenAwareness.customModels`	`[]`	Add models not yet built into the extension.

Keeping pricing current

GitHub may update multipliers or add new models at any time. You don't need to wait for an extension update.

Override a built-in model's multiplier:

// settings.json
"copilotTokenAwareness.modelOverrides": {
  "claude-sonnet-4-6": { "inputMultiplier": 0.35 }
}

Add a brand-new model:

"copilotTokenAwareness.customModels": [
  {
    "id": "my-new-model",
    "displayName": "My New Model",
    "inputMultiplier": 0.2,
    "outputMultiplier": 1.0,
    "contextWindow": 200000
  }
]

Update the base token unit price:

"copilotTokenAwareness.tokenUnitPriceUsd": 0.000012

Commands

Command	Description
`Copilot Token Awareness: Show Breakdown`	Open the Breakdown panel
`Copilot Token Awareness: Reset Session Totals`	Clear the session-level cumulative counter

Token Counting

Uses tiktoken (cl100k_base encoding) running entirely in WebAssembly inside VS Code - no data is sent to any external service. Falls back to a character-based heuristic (~4 chars/token) if the WASM module fails to load; the status bar shows a warning in that case.

Tokenizer note: cl100k_base is OpenAI's tokenizer. Anthropic (Claude) and Google (Gemini) use their own tokenizers. For typical English and source code the counts are within ±5%, which is within the stated estimate variance. The variance percentages shown in the UI (±10% Ask, ±15% Agent) account for this approximation.

Accuracy & Disclaimer

This extension is an awareness tool, not a billing meter. Estimates are based on:

Copilot's published architecture and empirically observed behaviour
Turn 1 only (conversation history is not pre-knowable)
Non-cached token rates (cached turns are cheaper; the disclaimer in the panel notes this)
The assumption that Copilot uses the entire active file and snippets from tabs (actual selection may vary by feature and version)

Expected accuracy: ±10% for Ask mode, ±15% for Agent mode.

For official billing information, plan allowances, and current per-token rates, always refer to:
📄 GitHub Copilot - Models and Pricing

Copilot Token Awareness

Raj Uppadhyay

Copilot Token Awareness

Screenshot

Why This Exists

Features

How the Estimate Is Built

Context sources (in order)

What is NOT included (by design)

Ask mode vs Agent mode

Cost Calculation

Supported Models

Anthropic

Google

OpenAI

Fine-tuned (GitHub) & Microsoft

Settings Reference

Keeping pricing current

Commands

Token Counting

Accuracy & Disclaimer

License