AI Token Optimizer
Save 40% on AI token costs automatically. Works silently across Cursor, Continue, Cline, Claude Code, and any OpenAI-compatible tool.
No config changes to your AI tools. No prompts modified visibly. Just lower bills.
How It Works
1. Install the extension. A local proxy starts on http://localhost:8765.
2. Point your tool at the proxy:
- Cursor: Settings > AI > Base URL >
http://localhost:8765
- Continue:
config.json > "apiBase": "http://localhost:8765"
- Claude Code: Enable forward proxy (Command Palette > "Toggle Forward Proxy") > launch from VS Code terminal
- Any tool:
OPENAI_BASE_URL=http://localhost:8765
3. Code normally. The optimizer compresses every prompt before it reaches the API and forwards your API key transparently.
Check the status bar for live savings: ⚡ 1,240 tkns saved $0.004
Optimization Strategies
Six strategies run in cascade, stopping early once the token budget is met:
| Strategy |
Savings |
What It Does |
| Whitespace Normalize |
3-8% |
Lossless: collapses redundant spaces and blank lines |
| Deduplicate |
5-20% |
Removes repeated sentences across messages |
| Intent Distill |
10-30% |
Strips filler words from user queries |
| Reference Substitute |
10-25% |
Aliases repeated long identifiers |
| History Summarize |
30-50% |
Compresses old conversation turns into bullet points |
| Context Prune |
20-40% |
Drops low-relevance messages when over budget |
Strategies run cheapest-first. Your prompts are never stored -- the proxy optimizes in-flight and forgets.
| Tool |
How to Connect |
| Cursor |
Settings > AI > Base URL > http://localhost:8765 |
| Continue |
config.json > "apiBase": "http://localhost:8765" |
| Cline |
Settings > API Base URL > http://localhost:8765 |
| Claude Code |
Enable forward proxy > launch from VS Code terminal |
| Open Interpreter |
OPENAI_BASE_URL=http://localhost:8765 |
| Any OpenAI-compatible |
Set OPENAI_BASE_URL=http://localhost:8765 |
Supports OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) API formats. API keys are forwarded transparently -- the proxy never stores them.
Dashboard
Click the status bar item to open the savings dashboard:
- Lifetime token savings and estimated cost saved
- Per-request history with strategy breakdown
- Compression ratio trends
- Proxy connection URL for quick reference
Commands
| Command |
Description |
| AI Token Optimizer: Show Dashboard |
Open the savings dashboard |
| AI Token Optimizer: Toggle On/Off |
Enable or disable optimization |
| AI Token Optimizer: Toggle Forward Proxy |
Enable env var injection for terminal tools |
| AI Token Optimizer: Optimize Current Context |
Manually optimize selected text |
| AI Token Optimizer: Reset Session |
Clear session stats |
Settings
| Setting |
Default |
Description |
tokOptimizer.enabled |
true |
Enable/disable optimization |
tokOptimizer.model |
gpt-4o |
Your primary AI model (for token counting) |
tokOptimizer.tokenBudget |
8000 |
Max input tokens per request |
tokOptimizer.aggressiveness |
0.5 |
0 = conservative, 1 = maximum compression |
tokOptimizer.strategies |
all 6 |
Which strategies to apply |
tokOptimizer.pythonServiceUrl |
http://localhost:8766 |
Optional: LLMLingua compression service |
Pro Features (optional)
For teams and power users who want deeper compression:
|
Free |
Pro ($15/mo) |
Enterprise ($30/seat) |
| Strategies |
3 basic |
All 6 |
All 6 + shared pools |
| Daily savings cap |
2,000 tokens |
Unlimited |
Unlimited |
| LLMLingua compression |
-- |
Included |
Included |
| Analytics |
Basic |
Full dashboard |
Org-level + audit logs |
| SSO (Okta/Azure AD) |
-- |
-- |
Included |
| On-premise deployment |
-- |
-- |
Docker + Helm |
The free tier works great for individual use. No account required -- just install and go.
FAQ
Does it store my prompts?
No. The proxy optimizes messages in-flight and forgets them immediately. Nothing is logged, cached, or sent to any third party. Your API key is forwarded directly to OpenAI/Anthropic.
Will it break my AI tool's output?
No. The optimizer only modifies the input tokens (your prompts and conversation history). The AI model's response comes back unmodified. Tested across 59 accuracy benchmarks with zero quality degradation.
Does it work offline?
Yes. The proxy runs entirely on your machine. The optional Python service (LLMLingua) also runs locally via Docker.
What about streaming?
Fully supported. Streaming responses (stream: true) are proxied through unchanged.
How do I check it's working?
- Status bar shows live savings
curl http://localhost:8765/health returns service status
curl http://localhost:8765/stats returns lifetime stats
License
Business Source License 1.1 — free to use, but you may not offer a competing token optimization service. Converts to Apache 2.0 on 2030-04-11.