Skip to content
| Marketplace
Sign in
Visual Studio Code>Other>Copilot Context TracerNew to Visual Studio Code? Get it now.
Copilot Context Tracer

Copilot Context Tracer

Anant Agarwal

|
6 installs
| (0) | Free
Inspect the exact context Copilot sends to LLMs — token breakdown, system prompts, file context — via OpenTelemetry.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Copilot Context Tracer v2

Inspect exactly what GitHub Copilot sends to LLMs — and learn how to write more precise, cost-effective prompts.

What it does

Every time Copilot Chat calls an LLM, it builds a context window from many sources. This extension intercepts those calls via OpenTelemetry and shows you:

  • Token counts — input, cached input, output, and reasoning tokens per call
  • Cache efficiency — what % of your input tokens came from Anthropic/OpenAI cache (free re-use)
  • Context breakdown table — which part of the context is system prompt vs. your message vs. file content vs. prior turns vs. tool definitions
  • Actual text content — read the exact strings sent to the model (requires captureContent: true)
  • Tool executions — see execute_tool spans alongside LLM spans in the same timeline
  • Model-level aggregation — total tokens per model with avg latency
  • Request options — temperature, max tokens, reasoning effort, response API shape

Why this matters

The main cost driver in Copilot is input tokens. Most of them are:

  1. System instructions you can't change
  2. Tool definitions — often 20KB+ of JSON schemas (big and hidden)
  3. File context — open files Copilot injects automatically
  4. Prior conversation turns — accumulate fast in long chats

Understanding this lets you:

  • Keep conversations short and focused
  • Avoid opening large files unnecessarily
  • Know when cached tokens are doing the heavy lifting (much cheaper)

Setup

1. Install and start

The extension auto-starts a local OTLP collector on port 4318 when VS Code opens.

2. Point Copilot at it

Add to settings.json:

"github.copilot.chat.otel.enabled": true,
"github.copilot.chat.otel.exporterType": "otlp-http",
"github.copilot.chat.otel.otlpEndpoint": "http://127.0.0.1:4318"

The extension sets these automatically on start.

3. Enable content capture (optional but recommended)

To see the actual text inside each context section (not just token counts):

"github.copilot.chat.otel.captureContent": true

4. Open the dashboard

Click the status bar item (🔢 N tok · M calls) or run:

Copilot Context Tracer: Show Dashboard

UI Guide

Call list

Each row is one LLM call. Click to expand:

  • Blue pill = fresh input tokens (billed normally)
  • Purple pill = cached tokens + cache hit % (cheaper/free re-use)
  • Green pill = output tokens
  • Amber pill = reasoning tokens (thinking models only)

Context Breakdown table

Inside an expanded call, the table shows how input tokens are divided:

Column Meaning
Context Type Which part of the context window
Segs How many message segments
Chars Character count
Est. Tokens Rough token estimate (chars ÷ 4)
Share % of total context
Inspect Expand to read the actual text

Click Inspect ↓ on any row to read the actual content sent.

Tool spans

Orange-bordered rows are execute_tool spans (non-LLM operations like todo lists, file reads). They show the tool name, arguments, and result.

Extension Settings

Setting Default Description
copilotContextTracer.collectorPort 4318 Port for the local OTLP collector
copilotContextTracer.autoStart true Auto-start collector on VS Code open
copilotContextTracer.maxStoredSpans 100 Max spans to keep in session

Changelog

v2.0.0

  • Fixed: Expanded rows no longer auto-collapse every 5-6 seconds. The dashboard now only fully re-renders when new spans arrive; otherwise it uses a push-update channel.
  • New: Context breakdown shown as a proper table (not a list) with sortable columns
  • New: Each context table row expands in-place (no layout shift)
  • New: Tool execution spans (execute_tool) shown with their own card style
  • New: Cache hit % shown inline on the cached token pill
  • New: Temperature, top-p, request options, and request shape in metadata
  • New: Model table now shows avg call duration
  • New: Better parsing of gen_ai.system_instructions and parts[] message format
  • New: Export CSV now includes cache ratio and tool name columns
  • Fixed: User request preview no longer truncates large JSON payloads incorrectly
  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft