Skip to content
| Marketplace
Sign in
Visual Studio Code>Linters>LLM Token Surgeon ๐Ÿ”ชNew to Visual Studio Code?ย Get it now.
LLM Token Surgeon ๐Ÿ”ช

LLM Token Surgeon ๐Ÿ”ช

Ashish Sharda

| (0) | Free
Cut your LLM API bill by 30โ€“70%. Inline token counts, one-click optimization, and savings projections.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

LLM Token Surgeon for VS Code

Cut your LLM API bill by 30-70% without changing a line of your app logic.

The extension scans your Python files for prompt strings, shows you exactly how many tokens each one uses, and lets you optimize them in one click.


Requirements

The extension shells out to the llm-surgeon CLI. Install it first:

pip install llm-token-surgeon

Make sure llm-surgeon is on your PATH:

llm-surgeon --help   # should print usage

Features

Inline token counts

Open any Python file and token usage appears right next to your prompt variables โ€” no action needed.

system_prompt = "You are a helpful assistant..."   # 🔪 58 tokens โ†’ 19 (-67%)

Analyze File

Right-click โ†’ LLM Surgeon: Analyze File (or Command Palette: LLM Surgeon: Analyze File)

Prints a full token report to the Output panel:

📊 Token Analysis Report
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
  system_prompt        847 tokens  โ†’  231 tokens  (-73%)  $0.31/1k calls
  user_template        312 tokens  โ†’  198 tokens  (-37%)  $0.09/1k calls
  TOTAL: 54% reduction ยท $0.82 per 1,000 calls

Preview Optimization

Command Palette: LLM Surgeon: Preview Optimization

Shows a before/after diff in the Output panel without touching your file.

Optimize File (Apply)

Right-click โ†’ LLM Surgeon: Optimize File (Apply)

Rewrites your prompt strings in place. A .bak backup is created automatically before any changes are written.

Savings Panel

Title bar icon โ†’ LLM Surgeon: Show Savings Panel

Opens a sidebar webview with a live savings dashboard for the current file โ€” token reduction %, monthly savings at your call volume, and a per-variable breakdown.


Settings

Setting Default Description
llmSurgeon.model gpt-4o Target model for pricing (supports OpenAI, Anthropic, Google, Mistral, Ollama)
llmSurgeon.aggressiveness balanced conservative / balanced / aggressive
llmSurgeon.callsPerDay 1000 Daily API calls for monthly savings projection
llmSurgeon.showInlineHints true Toggle inline token count decorations
llmSurgeon.cliPath llm-surgeon Path to the CLI if not on PATH

Supported file types

Python (.py) โ€” JavaScript and TypeScript support coming in v0.3.


Supported models

OpenAI: gpt-4o, gpt-4.1, gpt-4-turbo, gpt-3.5-turbo Anthropic: claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-opus Google: gemini-2.0-flash, gemini-1.5-pro Mistral: mistral-large, mistral-7b Ollama: llama3, phi3


Troubleshooting

"No prompt strings found" โ€” the scanner looks for variables named *prompt*, *system*, *instruction*, *message*, *template*, *context*. Rename your variable or run llm-surgeon analyze directly from the terminal.

Inline hints not appearing โ€” check llmSurgeon.cliPath in Settings. Run which llm-surgeon in your terminal to find the correct path.

Optimize Apply not working โ€” make sure the file is saved before running. The CLI reads from disk, not the unsaved buffer.


Links

  • PyPI package
  • GitHub
  • Report an issue

MIT License โ€” built by @ashishjsharda

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
ยฉ 2026 Microsoft