Trimli AI — Token Optimizer

Cut your AI coding costs by an average of 40% — up to 60% on long sessions and agentic workflows. Works silently with Claude Code, Continue, Cline, and any OpenAI-compatible tool.

No config changes to your AI tools. No prompts modified visibly. Just lower bills.

The problem this solves

AI coding tools are expensive at scale. A typical developer sending 100 requests a day to GPT-4o spends $80–150/month on input tokens (agentic workflows send ~20K tokens per request including system prompts, file context, and conversation history) — most of it wasted on repeated context, verbose history, and filler the model doesn't need.

Trimli AI sits between your tool and the API. It strips the waste, keeps the signal, and forwards a leaner prompt. The model never knows. Your bill does.

Before:  28,400 tokens  →  $0.071 per request
After:   16,900 tokens  →  $0.042 per request
Saving:  11,500 tokens  →  $0.029 saved  (40% reduction)

Across 100 requests a day that's $87/month back in your pocket at the conservative estimate. On longer agentic sessions the number is higher.

Setup in 60 seconds

1. Install the extension. A local proxy starts automatically on http://localhost:8765.

2. Point your AI tool at the proxy:

Tool	Setting
Claude Code	Open terminal in VS Code → run `claude` — works automatically
Continue	`config.json` → `"apiBase": "http://localhost:8765"`
Cline	Settings → API Base URL → `http://localhost:8765`
Cursor	Own API key mode only → set base URL to `http://localhost:8765`
Any OpenAI-compatible tool	`OPENAI_BASE_URL=http://localhost:8765`

3. Code normally. The optimizer runs silently on every request.

Watch the status bar update in real time: ⚡ 1,240 tkns saved

How much will you actually save?

Savings depend on how you work. Here's what to expect across common workflows:

Workflow	Typical session	Long session / agentic
Short single-turn queries	5–15%	—
Multi-turn chat sessions	25–45%	45–55%
Code review with long context	30–50%	50–60%
Agentic sessions (tool calls)	35–55%	55–65%
Long debugging sessions	40–55%	55–65%
Average across all workflows	~40%	~60%

The more context your session accumulates, the more the optimizer saves. Short queries get modest savings. Long agentic sessions routinely hit 55–65%.

Real savings by model

Based on 100 requests/day at ~20,000 tokens/request (typical for agentic workflows — system prompts, file context, and conversation history):

Model	Monthly spend	40% savings	60% savings
GPT-4o ($2.50/M)	~$110/mo	$44 saved	$66 saved
GPT-4.1 ($2.00/M)	~$88/mo	$35 saved	$53 saved
Claude Sonnet ($3.00/M)	~$132/mo	$53 saved	$79 saved
Claude Opus ($15.00/M)	~$660/mo	$264 saved	$396 saved

Pro ($10/mo) pays for itself in under a day on any model. For a 5-person team: $175–$396/month saved.

Dashboard

Click the ⚡ status bar item to open the savings dashboard:

Lifetime stats — total tokens saved, estimated cost saved, average compression ratio
Per-request history — every request with raw vs optimized tokens, cost delta
Account — sign in, manage billing, upgrade

Web dashboard: sign in at app.trimliai.com to see full analytics and 30-day charts.

Commands

Command	What it does
`Trimli AI: Show Dashboard`	Open the savings dashboard panel
`Trimli AI: Toggle On/Off`	Pause or resume optimization
`Trimli AI: Toggle Forward Proxy`	Auto-inject proxy into VS Code terminal sessions (Claude Code)
`Trimli AI: Sign In`	Sign in to your Trimli AI account

Settings

Setting	Default	Description
`tokOptimizer.enabled`	`true`	Enable/disable optimization globally
`tokOptimizer.tokenBudget`	`8000`	Max input tokens before context pruning activates
`tokOptimizer.aggressiveness`	`0.5`	Compression aggressiveness: 0 = conservative, 1 = maximum

Tiers

	Free	Pro ($10/mo)	Enterprise ($30/seat/mo)
Optimization	Full	Full	Full + org shared pools
Average savings	~40%	~40%	~40–60%
Daily savings cap	200K tokens	Unlimited	Unlimited
Analytics	VS Code dashboard	Full web portal	Org-level + audit logs
SSO (Okta / Azure AD)	—	—	✓
On-premise deployment	—	—	✓ Docker + Helm
Support	Community	Email	Priority + SLA

No account required on the free tier. A licence key is created automatically when you install. Upgrade at app.trimliai.com.

FAQ

Does it store my prompts? No. The proxy optimizes in-flight and immediately discards the messages. Nothing is logged, cached, or sent anywhere except directly to the upstream AI API.

Will it change the quality of AI responses? No. Tested across 59 accuracy benchmarks — zero quality degradation detected.

Does it work with streaming? Yes. Streaming responses pass through unchanged. Only the input prompt is compressed.

Does it work offline? Yes. The proxy runs entirely on your machine.

Does it work with Claude Code? Yes — enable the forward proxy (Command Palette → Trimli AI: Toggle Forward Proxy), then launch Claude Code from the VS Code terminal.

What if I use multiple AI tools? Point all of them at http://localhost:8765. The proxy handles OpenAI and Anthropic formats simultaneously.

License

Business Source License 1.1 — free to use for individuals and teams. You may not offer a competing token optimization SaaS. Converts to Apache 2.0 on 2030-04-11.