Token Proctor
An open-source layer on top of GitHub Copilot (and any token-priced LLM plan) that answers the three questions every team eventually asks:
Ships as two surfaces from one core:
100% local. No network calls. We don't proxy prompts anywhere — we call Full design doc: docs/ANALYSIS.md. What's new in v0.4
Prerequisites
Quick start
Try the chat participant
Sample output:
Click 🚀 Accept & hand off to Copilot to have Copilot's default agent take over with file/terminal tools. Token Proctor tries to flip the chat model dropdown automatically (a handful of best-effort command ids); if your Copilot build doesn't expose any of them, you'll see a toast asking you to pick the model manually. Slash commands
Try the MCP server
Register it with your MCP client. Example (
For Copilot CLI / Claude Desktop:
Tools exposed:
ConfigurationVS Code settings
|
| Mode | Weights | Best for |
|---|---|---|
tokens (default) |
prioritize $/M token price | One-shot Q&A, docs, creative |
turns |
prioritize low premium × turns burn |
Agent loops (code_large, agentic) |
balanced |
weighted compromise | Mixed workloads |
Why it matters: Claude Sonnet has a 1× premium multiplier. On an agentic prompt predicted to run 20 turns, that's 20 premium requests. An o4-mini at 0.33× would burn ~6.6 — about 70% less of your monthly bucket. optimizeFor: turns surfaces that trade-off.
Policy file — .token-proctor.json
Drop at workspace root (or ~/.token-proctor/config.json):
{
"allowModels": ["gpt-4o-mini", "gpt-4o", "claude-sonnet-4", "o4-mini", "gemini-flash"],
"denyModels": ["claude-opus"],
"premiumModelsAllowedFor": ["code_large", "reasoning"],
"optimizeFor": "balanced",
"preferCheap": true,
"completenessThreshold": 60,
"redact": {
"builtins": true,
"patterns": ["CORP-[A-Z0-9]{12}"],
"blockOnMatch": false
},
"audit": {
"enabled": true,
"path": ".token-proctor/audit.jsonl"
},
"llmJudge": {
"enabled": true,
"confidenceThreshold": 0.85
},
"plan": {
"name": "squad",
"monthlyTokenAllowance": 10000000,
"overageUsdPerM": 5.0
}
}
Block reference:
- allow/deny/premium-for-task — gate the model pool the router can pick from.
- redact — built-in detectors cover AWS access/secret keys, GitHub/Slack/OpenAI/Stripe tokens, JWTs, PEM private keys, Google API keys. Matches are replaced with
[REDACTED:kind]before anything leaves the pure-function core. The forwarded prompt never contains raw secrets. - audit — opt-in JSONL log of every decision (task, model, cost, redactions, verdict). Local file; no network.
- llmJudge — when rule-based confidence <
confidenceThreshold, call the cheapest availablevscode.lmmodel to classify + estimate output tokens + estimate turns. - plan — token-based plan context. When
monthlyTokenAllowanceis set, the summary shows "plan=nameX.Y%".
Exact token counting
js-tiktoken is a regular dependency; token counts are exact (o200k_base) on every run. The summary tags tokens=exact. If the dep fails to load for some reason, falls back to a chars/4 + punctuation heuristic and tags tokens=heuristic.
Model catalog
Prices and premium multipliers are plain data in src/data/pricing.ts. Fork it, tune it for your org's real rates, ship it.
Architecture
┌─────────────────────────────────────────────────────────┐
│ VS Code Chat Participant (@proctor) │ ◄── primary UX
│ src/participant.ts │
└────────────┬────────────────────────────────────────────┘
│
│ ┌───────────────────────────────┐
▼ ▼ │
┌────────────────────────────────────┐ │
│ Core (src/core/) │ src/mcp-server.ts
│ taskClassifier • promptValidator │ ◄── same core over MCP
│ modelRouter • costEstimator │
│ llmJudge • redactor • policy │
│ tokens (js-tiktoken) • audit │
└────────────────────────────────────┘
▲
│
┌─────────┴──────────┐
│ src/data/pricing.ts │ ← model catalog, override per org
└────────────────────┘
Core modules are pure functions (except policy and audit which touch the filesystem). No globals, no network. Trivial to unit-test, easy to swap any piece.
Why this exists
Copilot Business/Enterprise (and most token-priced LLM plans) bill per premium request or per token. In practice, most overspend comes from:
- Users defaulting to the most powerful model for trivial edits.
- Vague prompts that require many expensive round-trips to finish.
- Agent mode running 10–30 turns of a 1× premium model on what could have been a 2-turn job on a 0× model.
Token Proctor surfaces all three before the call. It's the cheapest lever an org can pull on LLM spend — and it composes with, rather than replaces, whatever the underlying chat or agent does next.
Roadmap
- [x] v0.1 — classifier, validator, router, cost, chat participant, MCP server.
- [x] v0.2 —
.token-proctor.jsonpolicy (allow/deny/premium-gating) + secret redaction + JSONL audit log. - [x] v0.3 — LLM judge fallback classifier +
js-tiktokenfor exact counts. - [x] v0.4 — turns-aware cost projection, plan-aware allowance %,
optimizeForknob, Copilot agent hand-off, rename to Token Proctor. - [ ] v0.5 —
vscode.lm.registerToolso Copilot agent mode can call Proctor directly mid-turn. - [ ] v0.6 — Server-side GitHub Copilot Extension for centralized org routing and fleet-wide budget enforcement.
License
MIT. See LICENSE.