LLM Token Surgeon for VS Code

Cut your LLM API bill by 30-70% without changing a line of your app logic.

The extension scans your Python files for prompt strings, shows you exactly how many tokens each one uses, and lets you optimize them in one click.

Requirements

The extension shells out to the llm-surgeon CLI. Install it first:

pip install llm-token-surgeon

Make sure llm-surgeon is on your PATH:

llm-surgeon --help   # should print usage

Features

Inline token counts

Open any Python file and token usage appears right next to your prompt variables — no action needed.

system_prompt = "You are a helpful assistant..."   # 🔪 58 tokens → 19 (-67%)

Analyze File

Right-click → LLM Surgeon: Analyze File (or Command Palette: LLM Surgeon: Analyze File)

Prints a full token report to the Output panel:

📊 Token Analysis Report
════════════════════════════════════════════════════════
  system_prompt        847 tokens  →  231 tokens  (-73%)  $0.31/1k calls
  user_template        312 tokens  →  198 tokens  (-37%)  $0.09/1k calls
  TOTAL: 54% reduction · $0.82 per 1,000 calls

Preview Optimization

Command Palette: LLM Surgeon: Preview Optimization

Shows a before/after diff in the Output panel without touching your file.

Optimize File (Apply)

Right-click → LLM Surgeon: Optimize File (Apply)

Rewrites your prompt strings in place. A .bak backup is created automatically before any changes are written.

Savings Panel

Title bar icon → LLM Surgeon: Show Savings Panel

Opens a sidebar webview with a live savings dashboard for the current file — token reduction %, monthly savings at your call volume, and a per-variable breakdown.

Settings

Setting	Default	Description
`llmSurgeon.model`	`gpt-4o`	Target model for pricing (supports OpenAI, Anthropic, Google, Mistral, Ollama)
`llmSurgeon.aggressiveness`	`balanced`	`conservative` / `balanced` / `aggressive`
`llmSurgeon.callsPerDay`	`1000`	Daily API calls for monthly savings projection
`llmSurgeon.showInlineHints`	`true`	Toggle inline token count decorations
`llmSurgeon.cliPath`	`llm-surgeon`	Path to the CLI if not on PATH

Supported file types

Python (.py) — JavaScript and TypeScript support coming in v0.3.

Supported models

OpenAI: gpt-4o, gpt-4.1, gpt-4-turbo, gpt-3.5-turbo Anthropic: claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-opus Google: gemini-2.0-flash, gemini-1.5-pro Mistral: mistral-large, mistral-7b Ollama: llama3, phi3

Troubleshooting

"No prompt strings found" — the scanner looks for variables named *prompt*, *system*, *instruction*, *message*, *template*, *context*. Rename your variable or run llm-surgeon analyze directly from the terminal.

Inline hints not appearing — check llmSurgeon.cliPath in Settings. Run which llm-surgeon in your terminal to find the correct path.

Optimize Apply not working — make sure the file is saved before running. The CLI reads from disk, not the unsaved buffer.

Links

MIT License — built by @ashishjsharda

LLM Token Surgeon 🔪

Ashish Sharda