Skip to content
| Marketplace
Sign in
Visual Studio Code>AI>AI Token OptimizerNew to Visual Studio Code? Get it now.
AI Token Optimizer

AI Token Optimizer

tok-optimizer

| (0) | Free
Maximize AI context, minimize token cost — works across Copilot, Windsurf, Kiro & Cursor
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

AI Token Optimizer

Save 40% on AI token costs automatically. Works silently across Cursor, Continue, Cline, Claude Code, and any OpenAI-compatible tool.

No config changes to your AI tools. No prompts modified visibly. Just lower bills.


How It Works

1. Install the extension. A local proxy starts on http://localhost:8765.

2. Point your tool at the proxy:

  • Cursor: Settings > AI > Base URL > http://localhost:8765
  • Continue: config.json > "apiBase": "http://localhost:8765"
  • Claude Code: Enable forward proxy (Command Palette > "Toggle Forward Proxy") > launch from VS Code terminal
  • Any tool: OPENAI_BASE_URL=http://localhost:8765

3. Code normally. The optimizer compresses every prompt before it reaches the API and forwards your API key transparently.

Check the status bar for live savings: ⚡ 1,240 tkns saved $0.004


Optimization Strategies

Six strategies run in cascade, stopping early once the token budget is met:

Strategy Savings What It Does
Whitespace Normalize 3-8% Lossless: collapses redundant spaces and blank lines
Deduplicate 5-20% Removes repeated sentences across messages
Intent Distill 10-30% Strips filler words from user queries
Reference Substitute 10-25% Aliases repeated long identifiers
History Summarize 30-50% Compresses old conversation turns into bullet points
Context Prune 20-40% Drops low-relevance messages when over budget

Strategies run cheapest-first. Your prompts are never stored -- the proxy optimizes in-flight and forgets.


Supported Tools

Tool How to Connect
Cursor Settings > AI > Base URL > http://localhost:8765
Continue config.json > "apiBase": "http://localhost:8765"
Cline Settings > API Base URL > http://localhost:8765
Claude Code Enable forward proxy > launch from VS Code terminal
Open Interpreter OPENAI_BASE_URL=http://localhost:8765
Any OpenAI-compatible Set OPENAI_BASE_URL=http://localhost:8765

Supports OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) API formats. API keys are forwarded transparently -- the proxy never stores them.


Dashboard

Click the status bar item to open the savings dashboard:

  • Lifetime token savings and estimated cost saved
  • Per-request history with strategy breakdown
  • Compression ratio trends
  • Proxy connection URL for quick reference

Commands

Command Description
AI Token Optimizer: Show Dashboard Open the savings dashboard
AI Token Optimizer: Toggle On/Off Enable or disable optimization
AI Token Optimizer: Toggle Forward Proxy Enable env var injection for terminal tools
AI Token Optimizer: Optimize Current Context Manually optimize selected text
AI Token Optimizer: Reset Session Clear session stats

Settings

Setting Default Description
tokOptimizer.enabled true Enable/disable optimization
tokOptimizer.model gpt-4o Your primary AI model (for token counting)
tokOptimizer.tokenBudget 8000 Max input tokens per request
tokOptimizer.aggressiveness 0.5 0 = conservative, 1 = maximum compression
tokOptimizer.strategies all 6 Which strategies to apply
tokOptimizer.pythonServiceUrl http://localhost:8766 Optional: LLMLingua compression service

Pro Features (optional)

For teams and power users who want deeper compression:

Free Pro ($15/mo) Enterprise ($30/seat)
Strategies 3 basic All 6 All 6 + shared pools
Daily savings cap 2,000 tokens Unlimited Unlimited
LLMLingua compression -- Included Included
Analytics Basic Full dashboard Org-level + audit logs
SSO (Okta/Azure AD) -- -- Included
On-premise deployment -- -- Docker + Helm

The free tier works great for individual use. No account required -- just install and go.


FAQ

Does it store my prompts? No. The proxy optimizes messages in-flight and forgets them immediately. Nothing is logged, cached, or sent to any third party. Your API key is forwarded directly to OpenAI/Anthropic.

Will it break my AI tool's output? No. The optimizer only modifies the input tokens (your prompts and conversation history). The AI model's response comes back unmodified. Tested across 59 accuracy benchmarks with zero quality degradation.

Does it work offline? Yes. The proxy runs entirely on your machine. The optional Python service (LLMLingua) also runs locally via Docker.

What about streaming? Fully supported. Streaming responses (stream: true) are proxied through unchanged.

How do I check it's working?

  • Status bar shows live savings
  • curl http://localhost:8765/health returns service status
  • curl http://localhost:8765/stats returns lifetime stats

License

Business Source License 1.1 — free to use, but you may not offer a competing token optimization service. Converts to Apache 2.0 on 2030-04-11.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft