Model Router
A VS Code chat participant (@router) that classifies your prompt and routes it to the best model for the job. It uses the native vscode.lm API to reach models registered by other extensions (e.g. GitHub Copilot Chat), and can also call OpenAI-compatible HTTP endpoints with a user-supplied API key.
What it does
- Classifies the prompt (task: explain/generate/refactor/debug/…; complexity: trivial/moderate/complex).
- Routes via user rules, falling back to sensible defaults.
- Streams the response into the Chat view.
- Falls back across models and tiers if a provider fails.
- Shows the selected model and tier before the routed answer starts.
- Tracks estimated routing savings in a local dashboard.
- Can use a local Ollama/OpenAI-compatible model to classify prompts before routing, with heuristic fallback.
- Reads attached files: when the user adds
#file:foo.ts or a selection, the participant inlines the actual content into the prompt.
- Registers agent-style tools:
router_readFile, router_writeFile, router_listDir, router_searchWorkspace, router_runCommand. The model can call these to read/edit files and run shell commands. Write and run prompt the user for confirmation.
What it does not do
- It does not intercept or redirect Copilot Chat or any other extension's requests (VS Code exposes no supported API for that).
- It does not patch, hack, or inject into other extensions.
- It only routes requests sent to
@router.
Usage
- Install / run the extension (see Test locally below).
- Open the Chat view, type
@router <your prompt>.
- Optional slash commands:
/fast, /balanced, /deep, /explain, /refactor, /debug, /why, etc.
- Set
modelRouter.debug: true to see routing details inline, or use @router /why to inspect the last decision.
- Run Model Router: Open Savings Dashboard to see estimated cost saved by routing to cheaper tiers.
Savings dashboard
The dashboard estimates savings by comparing each routed request with a baseline tier, defaulting to deep. Because VS Code does not expose real billing data from model providers, these numbers are estimates based on rough token counts and configurable per-tier rates.
Tune the estimate in settings:
{
"modelRouter.costBaselineTier": "deep",
"modelRouter.tierCostRates": {
"fast": 0.00015,
"balanced": 0.0025,
"deep": 0.01
}
}
Use Model Router: Reset Savings Dashboard to clear the workspace metrics.
Local prompt classifier
By default, modelRouter.classifierMode is auto: Model Router tries the configured local classifier first, then falls back to the built-in heuristic classifier if the local model is unavailable or times out.
For Ollama:
ollama pull llama3.2:3b
ollama serve
{
"modelRouter.classifierMode": "auto",
"modelRouter.localClassifierProtocol": "ollama",
"modelRouter.localClassifierEndpoint": "http://127.0.0.1:11434/api/chat",
"modelRouter.localClassifierModel": "llama3.2:3b",
"modelRouter.localClassifierTimeoutMs": 2500
}
For LM Studio or another local OpenAI-compatible server, set modelRouter.localClassifierProtocol to openai-compatible and point modelRouter.localClassifierEndpoint at the local chat completions URL.
When tools are enabled, Model Router instructs the selected model to inspect the workspace and write files directly for setup, implementation, refactor, debug, and test tasks instead of pasting whole project files into chat. File writes and shell commands still ask for confirmation before they run.
Test locally
1. Install dependencies
cd model-router
npm install
2. Run unit tests (no VS Code required)
npm test
Expected: all tests in classifier.test.ts and router.test.ts pass.
3. Compile
npm run compile
Expected: no TypeScript errors; out/extension.js is produced.
4. Launch the Extension Development Host
Open the model-router/ folder in VS Code and press F5 (or Run → Start Debugging). A second VS Code window labelled [Extension Development Host] opens with the extension loaded.
5. Drive the participant
In the Extension Development Host window:
- Install GitHub Copilot Chat (or any other extension that registers models with
vscode.lm). Sign in.
- Open the Chat view (the speech-bubble icon in the Activity Bar, or
Ctrl/Cmd+Alt+I).
- Type:
@router explain closures in JavaScript — should stream from the fast tier.
- Type:
@router /deep design a rate limiter for a distributed API — should route to the deep tier.
- Type:
@router /why — shows the last routing decision (task, complexity, tier, model, fallbacks).
- Toggle
modelRouter.debug on in Settings to see the router banner inline.
6. Verify fallback
In the [Extension Development Host] window, add a broken HTTP model to settings to force a fallback:
"modelRouter.models": [
{
"id": "broken-test",
"provider": "http",
"tier": "deep",
"endpoint": "http://127.0.0.1:1/nope",
"httpModel": "x"
}
],
"modelRouter.routingRules": [
{ "name": "force-broken", "when": {}, "tier": "deep", "prefer": ["broken-test"] }
]
Send any prompt to @router. You should see an "unavailable — trying fallback…" notice, then a real response from a working model. Check Output → Model Router for the underlying error.
Configuration
{
"modelRouter.defaultTier": "balanced",
"modelRouter.forcedTier": "auto",
"modelRouter.routingRules": [
{ "name": "big-prompts", "when": { "minPromptLength": 800 }, "tier": "deep" },
{ "name": "prefer-sonnet", "when": { "task": ["review"] }, "tier": "balanced", "prefer": ["copilot-sonnet-35"] }
],
"modelRouter.models": [
{
"id": "copilot-o1",
"provider": "vscode-lm",
"vendor": "copilot",
"family": "o1",
"tier": "deep"
},
{
"id": "openrouter-sonnet",
"provider": "http",
"tier": "balanced",
"endpoint": "https://openrouter.ai/api/v1/chat/completions",
"httpModel": "anthropic/claude-3.5-sonnet",
"apiKeySecret": "openrouter.apiKey"
}
]
}
Store API keys via the command palette → Model Router: Store API Key.
Extending
- New model — add an entry to
modelRouter.models. No code change.
- New routing rule — add an entry to
modelRouter.routingRules. First match wins; defaults follow.
- New provider — implement
ModelProvider in src/models/, register it in ModelRegistry's constructor (one line).
- New classifier — implement
Classifier and swap it in createParticipant.
Project layout
src/
extension.ts activation entry
participant.ts chat participant handler
config.ts settings helpers
logger.ts output channel logger
classifier/ task + complexity classifier
router/ rule engine + defaults
models/ registry, vscode-lm provider, http provider
commands/ command palette entries
test/ unit tests