AI Token Optimizer

Save 40% on AI token costs automatically. Works silently across Cursor, Continue, Cline, Claude Code, and any OpenAI-compatible tool.

No config changes to your AI tools. No prompts modified visibly. Just lower bills.

How It Works

1. Install the extension. A local proxy starts on http://localhost:8765.

2. Point your tool at the proxy:

Cursor: Settings > AI > Base URL > http://localhost:8765
Continue: config.json > "apiBase": "http://localhost:8765"
Claude Code: Enable forward proxy (Command Palette > "Toggle Forward Proxy") > launch from VS Code terminal
Any tool: OPENAI_BASE_URL=http://localhost:8765

3. Code normally. The optimizer compresses every prompt before it reaches the API and forwards your API key transparently.

Check the status bar for live savings: ⚡ 1,240 tkns saved $0.004

Optimization Strategies

Six strategies run in cascade, stopping early once the token budget is met:

Strategy	Savings	What It Does
Whitespace Normalize	3-8%	Lossless: collapses redundant spaces and blank lines
Deduplicate	5-20%	Removes repeated sentences across messages
Intent Distill	10-30%	Strips filler words from user queries
Reference Substitute	10-25%	Aliases repeated long identifiers
History Summarize	30-50%	Compresses old conversation turns into bullet points
Context Prune	20-40%	Drops low-relevance messages when over budget

Strategies run cheapest-first. Your prompts are never stored -- the proxy optimizes in-flight and forgets.

Supported Tools

Tool	How to Connect
Cursor	Settings > AI > Base URL > `http://localhost:8765`
Continue	`config.json` > `"apiBase": "http://localhost:8765"`
Cline	Settings > API Base URL > `http://localhost:8765`
Claude Code	Enable forward proxy > launch from VS Code terminal
Open Interpreter	`OPENAI_BASE_URL=http://localhost:8765`
Any OpenAI-compatible	Set `OPENAI_BASE_URL=http://localhost:8765`

Supports OpenAI (/v1/chat/completions) and Anthropic (/v1/messages) API formats. API keys are forwarded transparently -- the proxy never stores them.

Dashboard

Click the status bar item to open the savings dashboard:

Lifetime token savings and estimated cost saved
Per-request history with strategy breakdown
Compression ratio trends
Proxy connection URL for quick reference

Commands

Command	Description
AI Token Optimizer: Show Dashboard	Open the savings dashboard
AI Token Optimizer: Toggle On/Off	Enable or disable optimization
AI Token Optimizer: Toggle Forward Proxy	Enable env var injection for terminal tools
AI Token Optimizer: Optimize Current Context	Manually optimize selected text
AI Token Optimizer: Reset Session	Clear session stats

Settings

Setting	Default	Description
`tokOptimizer.enabled`	`true`	Enable/disable optimization
`tokOptimizer.model`	`gpt-4o`	Your primary AI model (for token counting)
`tokOptimizer.tokenBudget`	`8000`	Max input tokens per request
`tokOptimizer.aggressiveness`	`0.5`	0 = conservative, 1 = maximum compression
`tokOptimizer.strategies`	all 6	Which strategies to apply
`tokOptimizer.pythonServiceUrl`	`http://localhost:8766`	Optional: LLMLingua compression service

Pro Features (optional)

For teams and power users who want deeper compression:

	Free	Pro ($15/mo)	Enterprise ($30/seat)
Strategies	3 basic	All 6	All 6 + shared pools
Daily savings cap	2,000 tokens	Unlimited	Unlimited
LLMLingua compression	--	Included	Included
Analytics	Basic	Full dashboard	Org-level + audit logs
SSO (Okta/Azure AD)	--	--	Included
On-premise deployment	--	--	Docker + Helm

The free tier works great for individual use. No account required -- just install and go.

FAQ

Does it store my prompts? No. The proxy optimizes messages in-flight and forgets them immediately. Nothing is logged, cached, or sent to any third party. Your API key is forwarded directly to OpenAI/Anthropic.

Will it break my AI tool's output? No. The optimizer only modifies the input tokens (your prompts and conversation history). The AI model's response comes back unmodified. Tested across 59 accuracy benchmarks with zero quality degradation.

Does it work offline? Yes. The proxy runs entirely on your machine. The optional Python service (LLMLingua) also runs locally via Docker.

What about streaming? Fully supported. Streaming responses (stream: true) are proxied through unchanged.

How do I check it's working?

Status bar shows live savings
curl http://localhost:8765/health returns service status
curl http://localhost:8765/stats returns lifetime stats

License

Business Source License 1.1 — free to use, but you may not offer a competing token optimization service. Converts to Apache 2.0 on 2030-04-11.

AI Token Optimizer

tok-optimizer

AI Token Optimizer

How It Works

Optimization Strategies

Supported Tools

Dashboard

Commands

Settings

Pro Features (optional)

FAQ

License