Copilot Context Tracer v2
Inspect exactly what GitHub Copilot sends to LLMs — and learn how to write more precise, cost-effective prompts.
What it does
Every time Copilot Chat calls an LLM, it builds a context window from many sources. This extension intercepts those calls via OpenTelemetry and shows you:
- Token counts — input, cached input, output, and reasoning tokens per call
- Cache efficiency — what % of your input tokens came from Anthropic/OpenAI cache (free re-use)
- Context breakdown table — which part of the context is system prompt vs. your message vs. file content vs. prior turns vs. tool definitions
- Actual text content — read the exact strings sent to the model (requires
captureContent: true)
- Tool executions — see
execute_tool spans alongside LLM spans in the same timeline
- Model-level aggregation — total tokens per model with avg latency
- Request options — temperature, max tokens, reasoning effort, response API shape
Why this matters
The main cost driver in Copilot is input tokens. Most of them are:
- System instructions you can't change
- Tool definitions — often 20KB+ of JSON schemas (big and hidden)
- File context — open files Copilot injects automatically
- Prior conversation turns — accumulate fast in long chats
Understanding this lets you:
- Keep conversations short and focused
- Avoid opening large files unnecessarily
- Know when cached tokens are doing the heavy lifting (much cheaper)
Setup
1. Install and start
The extension auto-starts a local OTLP collector on port 4318 when VS Code opens.
2. Point Copilot at it
Add to settings.json:
"github.copilot.chat.otel.enabled": true,
"github.copilot.chat.otel.exporterType": "otlp-http",
"github.copilot.chat.otel.otlpEndpoint": "http://127.0.0.1:4318"
The extension sets these automatically on start.
3. Enable content capture (optional but recommended)
To see the actual text inside each context section (not just token counts):
"github.copilot.chat.otel.captureContent": true
4. Open the dashboard
Click the status bar item (🔢 N tok · M calls) or run:
Copilot Context Tracer: Show Dashboard
UI Guide
Call list
Each row is one LLM call. Click to expand:
- Blue pill = fresh input tokens (billed normally)
- Purple pill = cached tokens + cache hit % (cheaper/free re-use)
- Green pill = output tokens
- Amber pill = reasoning tokens (thinking models only)
Context Breakdown table
Inside an expanded call, the table shows how input tokens are divided:
| Column |
Meaning |
| Context Type |
Which part of the context window |
| Segs |
How many message segments |
| Chars |
Character count |
| Est. Tokens |
Rough token estimate (chars ÷ 4) |
| Share |
% of total context |
| Inspect |
Expand to read the actual text |
Click Inspect ↓ on any row to read the actual content sent.
Orange-bordered rows are execute_tool spans (non-LLM operations like todo lists, file reads). They show the tool name, arguments, and result.
Extension Settings
| Setting |
Default |
Description |
copilotContextTracer.collectorPort |
4318 |
Port for the local OTLP collector |
copilotContextTracer.autoStart |
true |
Auto-start collector on VS Code open |
copilotContextTracer.maxStoredSpans |
100 |
Max spans to keep in session |
Changelog
v2.0.0
- Fixed: Expanded rows no longer auto-collapse every 5-6 seconds. The dashboard now only fully re-renders when new spans arrive; otherwise it uses a push-update channel.
- New: Context breakdown shown as a proper table (not a list) with sortable columns
- New: Each context table row expands in-place (no layout shift)
- New: Tool execution spans (
execute_tool) shown with their own card style
- New: Cache hit % shown inline on the cached token pill
- New: Temperature, top-p, request options, and request shape in metadata
- New: Model table now shows avg call duration
- New: Better parsing of
gen_ai.system_instructions and parts[] message format
- New: Export CSV now includes cache ratio and tool name columns
- Fixed: User request preview no longer truncates large JSON payloads incorrectly