Copilot Context Tracer

Inspect exactly what GitHub Copilot sends to LLMs — and learn how to write more precise, cost-effective prompts.

What it does

Every time Copilot Chat calls an LLM, it builds a context window from many sources. This extension intercepts those calls via OpenTelemetry and shows you:

Per-query grouping — every time you press Send, all resulting LLM calls and tool executions are grouped under that query with aggregated token totals
Token counts — input, cached input, output, and reasoning tokens per call and per query
Cache efficiency — what % of your input tokens came from Anthropic/OpenAI cache (free re-use)
GitHub Copilot Credits — real credit usage from copilot_chat.copilot_usage_nano_aiu (only shown when real data is available from the API; never estimated)
Prompt source breakdown — which part of the context is system prompt vs. your message vs. file content vs. prior turns vs. tool definitions
Prompt block classifier — each block is semantically labelled (User Input, Current File, Workspace Context, Agent Instructions, Tool Result, etc.)
Actual text content — read the exact strings sent to the model (requires captureContent: true)
Tool executions — see execute_tool spans alongside LLM spans in the same timeline, with expandable arguments and results
Model-level aggregation — total tokens and credits per model with avg latency
Request options — temperature, max tokens, reasoning effort, response API shape

Why this matters

The main cost driver in Copilot is input tokens. Most of them are:

System instructions you can't change
Tool definitions — often 20KB+ of JSON schemas (big and hidden)
File context — open files Copilot injects automatically
Prior conversation turns — accumulate fast in long chats

Understanding this lets you:

Keep conversations short and focused
Avoid opening large files unnecessarily
Know when cached tokens are doing the heavy lifting (much cheaper)
Track real credit cost per query

Setup

1. Install and start

The extension auto-starts a local OTLP collector on port 4318 when VS Code opens.

2. Point Copilot at it

Add to settings.json:

"github.copilot.chat.otel.enabled": true,
"github.copilot.chat.otel.exporterType": "otlp-http",
"github.copilot.chat.otel.otlpEndpoint": "http://127.0.0.1:4318"

The extension sets these automatically on start.

3. Enable content capture (optional but recommended)

To see the actual text inside each prompt source section (not just character counts):

"github.copilot.chat.otel.captureContent": true

4. Open the dashboard

Click the status bar item (🔢 N tok · M calls) or run:

Copilot Context Tracer: Show Dashboard

UI Guide

Query groups

Each time you press Send in Copilot Chat, a new query group appears at the top of the dashboard. Each group shows:

Aggregated input / cached / output / reasoning token counts across all LLM calls in that query
Real Copilot credit cost (when available from the API)
Your user request preview
All sub-calls (LLM calls + tool executions) collapsed underneath

Call cards

Each LLM call inside a query can be expanded. Token pills:

Blue pill = fresh input tokens (billed normally)
Purple pill = cached tokens + cache hit % (cheaper/free re-use)
Green pill = output tokens
Amber pill = reasoning tokens (thinking models only)
Credit chip = real GitHub Copilot credits (only shown when reported by the API)

Prompt Source breakdown table

Inside an expanded LLM call, the table shows how the context window is divided:

Column	Meaning
Prompt Source	Which semantic category (System Instructions, Prompt Sources, Tool Results, etc.)
Blocks	Number of message segments in this category
Chars	Character count
Tokens	As reported by the API (or "Not reported by API")
Share	% of total context characters
Inspect ↗	Open modal to read the actual text

Click Inspect ↗ on any row to open a full-screen modal showing the exact text sent in each block. The modal supports:

Search — filter segments by keyword with highlighted matches
Expand / Collapse all — show or hide all content at once
Maximize / Restore — toggle full-screen view
Escape — close the modal

Each block is automatically labelled by the prompt classifier (e.g. User Input, Current File, Workspace Context, Agent Instructions, Prior Copilot Response).

Tool spans

Tool execution rows are shown with an orange border inside each query. They display:

Tool name and type
Expandable Arguments and Result sections
Execution duration

Commands

Command	Description
`Copilot Context Tracer: Show Dashboard`	Open the dashboard panel
`Copilot Context Tracer: Start Collector`	Start the local OTLP collector
`Copilot Context Tracer: Stop Collector`	Stop the local OTLP collector
`Copilot Context Tracer: Reset Session`	Clear all captured spans for this session
`Copilot Context Tracer: Open Settings`	Open extension settings directly

Extension Settings

Setting	Default	Description
`copilotContextTracer.collectorPort`	`4318`	Port for the local OTLP collector
`copilotContextTracer.autoStart`	`true`	Auto-start collector on VS Code open
`copilotContextTracer.maxStoredSpans`	`100`	Max spans to keep in session

Changelog

v2.4.8

New: Query-level grouping — each Copilot "Send" creates a collapsible query card with aggregated token and credit totals
New: Active query tracking — listens to Copilot submit/stop commands to stamp spans with the correct query boundary
New: Real GitHub Copilot credits from copilot_chat.copilot_usage_nano_aiu (1 AIU = 1 × 10⁹ nano-AIU); shown per span, per query, and in the status bar; never estimated from tokens
New: Prompt block classifier — each context segment is automatically labelled (User Input, Current File, Workspace Context, Agent Instructions, Tool Result, Prior Copilot Response, etc.)
New: Open Settings command for quick access to extension configuration
New: Inspect modal — full-screen viewer for prompt content with keyword search (highlighted matches), expand/collapse all, and maximize/restore
New: Inline reset confirmation bar (replaces blocked confirm() dialog in webviews)
New: Live data push — dashboard increments without full re-render; only re-renders when new spans arrive
Changed: Context breakdown table column "Context Type" → "Prompt Source", "Segs" → "Blocks", "Est. Tokens" → "Tokens" (reports API value or "Not reported by API")
Changed: Tool span cards now have expandable Arguments and Result sections

v2.0.0

Fixed: Expanded rows no longer auto-collapse every 5-6 seconds. The dashboard now only fully re-renders when new spans arrive; otherwise it uses a push-update channel.
New: Context breakdown shown as a proper table (not a list) with sortable columns
New: Each context table row expands in-place (no layout shift)
New: Tool execution spans (execute_tool) shown with their own card style
New: Cache hit % shown inline on the cached token pill
New: Temperature, top-p, request options, and request shape in metadata
New: Model table now shows avg call duration
New: Better parsing of gen_ai.system_instructions and parts[] message format
New: Export CSV now includes cache ratio and tool name columns
Fixed: User request preview no longer truncates large JSON payloads incorrectly

Copilot Context Tracer

Anant Agarwal

Copilot Context Tracer

What it does

Why this matters

Setup

1. Install and start

2. Point Copilot at it

3. Enable content capture (optional but recommended)

4. Open the dashboard

UI Guide

Query groups

Call cards

Prompt Source breakdown table

Inspect modal

Tool spans

Commands

Extension Settings

Changelog

v2.4.8

v2.0.0