LLM Token Surgeon for VS Code
Cut your LLM API bill by 30-70% without changing a line of your app logic.
The extension scans your Python files for prompt strings, shows you exactly how many tokens each one uses, and lets you optimize them in one click.
Requirements
The extension shells out to the llm-surgeon CLI. Install it first:
pip install llm-token-surgeon
Make sure llm-surgeon is on your PATH:
llm-surgeon --help # should print usage
Features
Inline token counts
Open any Python file and token usage appears right next to your prompt variables โ no action needed.
system_prompt = "You are a helpful assistant..." # 🔪 58 tokens โ 19 (-67%)
Analyze File
Right-click โ LLM Surgeon: Analyze File (or Command Palette: LLM Surgeon: Analyze File)
Prints a full token report to the Output panel:
📊 Token Analysis Report
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
system_prompt 847 tokens โ 231 tokens (-73%) $0.31/1k calls
user_template 312 tokens โ 198 tokens (-37%) $0.09/1k calls
TOTAL: 54% reduction ยท $0.82 per 1,000 calls
Preview Optimization
Command Palette: LLM Surgeon: Preview Optimization
Shows a before/after diff in the Output panel without touching your file.
Optimize File (Apply)
Right-click โ LLM Surgeon: Optimize File (Apply)
Rewrites your prompt strings in place. A .bak backup is created automatically before any changes are written.
Savings Panel
Title bar icon โ LLM Surgeon: Show Savings Panel
Opens a sidebar webview with a live savings dashboard for the current file โ token reduction %, monthly savings at your call volume, and a per-variable breakdown.
Settings
| Setting |
Default |
Description |
llmSurgeon.model |
gpt-4o |
Target model for pricing (supports OpenAI, Anthropic, Google, Mistral, Ollama) |
llmSurgeon.aggressiveness |
balanced |
conservative / balanced / aggressive |
llmSurgeon.callsPerDay |
1000 |
Daily API calls for monthly savings projection |
llmSurgeon.showInlineHints |
true |
Toggle inline token count decorations |
llmSurgeon.cliPath |
llm-surgeon |
Path to the CLI if not on PATH |
Supported file types
Python (.py) โ JavaScript and TypeScript support coming in v0.3.
Supported models
OpenAI: gpt-4o, gpt-4.1, gpt-4-turbo, gpt-3.5-turbo
Anthropic: claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-opus
Google: gemini-2.0-flash, gemini-1.5-pro
Mistral: mistral-large, mistral-7b
Ollama: llama3, phi3
Troubleshooting
"No prompt strings found" โ the scanner looks for variables named *prompt*, *system*, *instruction*, *message*, *template*, *context*. Rename your variable or run llm-surgeon analyze directly from the terminal.
Inline hints not appearing โ check llmSurgeon.cliPath in Settings. Run which llm-surgeon in your terminal to find the correct path.
Optimize Apply not working โ make sure the file is saved before running. The CLI reads from disk, not the unsaved buffer.
Links
MIT License โ built by @ashishjsharda