Model Router

A VS Code chat participant (@router) that classifies your prompt and routes it to the best model for the job. It uses the native vscode.lm API to reach models registered by other extensions (e.g. GitHub Copilot Chat), and can also call OpenAI-compatible HTTP endpoints with a user-supplied API key.

What it does

Classifies the prompt (task: explain/generate/refactor/debug/…; complexity: trivial/moderate/complex).
Routes via user rules, falling back to sensible defaults.
Streams the response into the Chat view.
Falls back across models and tiers if a provider fails.
Shows the selected model and tier before the routed answer starts.
Tracks estimated routing savings in a local dashboard.
Can use a local Ollama/OpenAI-compatible model to classify prompts before routing, with heuristic fallback.
Reads attached files: when the user adds #file:foo.ts or a selection, the participant inlines the actual content into the prompt.
Registers agent-style tools: router_readFile, router_writeFile, router_listDir, router_searchWorkspace, router_runCommand. The model can call these to read/edit files and run shell commands. Write and run prompt the user for confirmation.

What it does not do

It does not intercept or redirect Copilot Chat or any other extension's requests (VS Code exposes no supported API for that).
It does not patch, hack, or inject into other extensions.
It only routes requests sent to @router.

Usage

Install / run the extension (see Test locally below).
Open the Chat view, type @router <your prompt>.
Optional slash commands: /fast, /balanced, /deep, /explain, /refactor, /debug, /why, etc.
Set modelRouter.debug: true to see routing details inline, or use @router /why to inspect the last decision.
Run Model Router: Open Savings Dashboard to see estimated cost saved by routing to cheaper tiers.

Savings dashboard

The dashboard estimates savings by comparing each routed request with a baseline tier, defaulting to deep. Because VS Code does not expose real billing data from model providers, these numbers are estimates based on rough token counts and configurable per-tier rates.

Tune the estimate in settings:

{
  "modelRouter.costBaselineTier": "deep",
  "modelRouter.tierCostRates": {
    "fast": 0.00015,
    "balanced": 0.0025,
    "deep": 0.01
  }
}

Use Model Router: Reset Savings Dashboard to clear the workspace metrics.

Local prompt classifier

By default, modelRouter.classifierMode is auto: Model Router tries the configured local classifier first, then falls back to the built-in heuristic classifier if the local model is unavailable or times out.

For Ollama:

ollama pull llama3.2:3b
ollama serve

{
  "modelRouter.classifierMode": "auto",
  "modelRouter.localClassifierProtocol": "ollama",
  "modelRouter.localClassifierEndpoint": "http://127.0.0.1:11434/api/chat",
  "modelRouter.localClassifierModel": "llama3.2:3b",
  "modelRouter.localClassifierTimeoutMs": 2500
}

For LM Studio or another local OpenAI-compatible server, set modelRouter.localClassifierProtocol to openai-compatible and point modelRouter.localClassifierEndpoint at the local chat completions URL.

Tool use behavior

When tools are enabled, Model Router instructs the selected model to inspect the workspace and write files directly for setup, implementation, refactor, debug, and test tasks instead of pasting whole project files into chat. File writes and shell commands still ask for confirmation before they run.

Test locally

1. Install dependencies

cd model-router
npm install

2. Run unit tests (no VS Code required)

npm test

Expected: all tests in classifier.test.ts and router.test.ts pass.

3. Compile

npm run compile

Expected: no TypeScript errors; out/extension.js is produced.

4. Launch the Extension Development Host

Open the model-router/ folder in VS Code and press F5 (or Run → Start Debugging). A second VS Code window labelled [Extension Development Host] opens with the extension loaded.

5. Drive the participant

In the Extension Development Host window:

Install GitHub Copilot Chat (or any other extension that registers models with vscode.lm). Sign in.
Open the Chat view (the speech-bubble icon in the Activity Bar, or Ctrl/Cmd+Alt+I).
Type: @router explain closures in JavaScript — should stream from the fast tier.
Type: @router /deep design a rate limiter for a distributed API — should route to the deep tier.
Type: @router /why — shows the last routing decision (task, complexity, tier, model, fallbacks).
Toggle modelRouter.debug on in Settings to see the router banner inline.

6. Verify fallback

In the [Extension Development Host] window, add a broken HTTP model to settings to force a fallback:

"modelRouter.models": [
  {
    "id": "broken-test",
    "provider": "http",
    "tier": "deep",
    "endpoint": "http://127.0.0.1:1/nope",
    "httpModel": "x"
  }
],
"modelRouter.routingRules": [
  { "name": "force-broken", "when": {}, "tier": "deep", "prefer": ["broken-test"] }
]

Send any prompt to @router. You should see an "unavailable — trying fallback…" notice, then a real response from a working model. Check Output → Model Router for the underlying error.

Configuration

{
  "modelRouter.defaultTier": "balanced",
  "modelRouter.forcedTier": "auto",

  "modelRouter.routingRules": [
    { "name": "big-prompts",       "when": { "minPromptLength": 800 }, "tier": "deep" },
    { "name": "prefer-sonnet",     "when": { "task": ["review"] }, "tier": "balanced", "prefer": ["copilot-sonnet-35"] }
  ],

  "modelRouter.models": [
    {
      "id": "copilot-o1",
      "provider": "vscode-lm",
      "vendor": "copilot",
      "family": "o1",
      "tier": "deep"
    },
    {
      "id": "openrouter-sonnet",
      "provider": "http",
      "tier": "balanced",
      "endpoint": "https://openrouter.ai/api/v1/chat/completions",
      "httpModel": "anthropic/claude-3.5-sonnet",
      "apiKeySecret": "openrouter.apiKey"
    }
  ]
}

Store API keys via the command palette → Model Router: Store API Key.

Extending

New model — add an entry to modelRouter.models. No code change.
New routing rule — add an entry to modelRouter.routingRules. First match wins; defaults follow.
New provider — implement ModelProvider in src/models/, register it in ModelRegistry's constructor (one line).
New classifier — implement Classifier and swap it in createParticipant.

Project layout

src/
  extension.ts          activation entry
  participant.ts        chat participant handler
  config.ts             settings helpers
  logger.ts             output channel logger
  classifier/           task + complexity classifier
  router/               rule engine + defaults
  models/               registry, vscode-lm provider, http provider
  commands/             command palette entries
  test/                 unit tests

Model Router

netxil

Model Router

What it does

What it does not do

Usage

Savings dashboard

Local prompt classifier

Tool use behavior

Test locally

1. Install dependencies

2. Run unit tests (no VS Code required)

3. Compile

4. Launch the Extension Development Host

5. Drive the participant

6. Verify fallback

Configuration

Extending

Project layout