Skip to content
| Marketplace
Sign in
Visual Studio Code>AI>Trimli AI — Token OptimizerNew to Visual Studio Code? Get it now.
Trimli AI — Token Optimizer

Trimli AI — Token Optimizer

trimliai

|
38 installs
| (0) | Free
Cut your AI bill by 40% — works with Claude Code, Continue, Cline & any OpenAI-compatible tool
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Trimli AI — Token Optimizer

Cut your AI coding costs by an average of 40% — up to 60% on long sessions and agentic workflows. Works silently with Claude Code, Continue, Cline, and any OpenAI-compatible tool.

No config changes to your AI tools. No prompts modified visibly. Just lower bills.


The problem this solves

AI coding tools are expensive at scale. A typical developer sending 100 requests a day to GPT-4o spends $80–150/month on input tokens (agentic workflows send ~20K tokens per request including system prompts, file context, and conversation history) — most of it wasted on repeated context, verbose history, and filler the model doesn't need.

Trimli AI sits between your tool and the API. It strips the waste, keeps the signal, and forwards a leaner prompt. The model never knows. Your bill does.

Before:  28,400 tokens  →  $0.071 per request
After:   16,900 tokens  →  $0.042 per request
Saving:  11,500 tokens  →  $0.029 saved  (40% reduction)

Across 100 requests a day that's $87/month back in your pocket at the conservative estimate. On longer agentic sessions the number is higher.


Setup in 60 seconds

1. Install the extension. A local proxy starts automatically on http://localhost:8765.

2. Point your AI tool at the proxy:

Tool Setting
Claude Code Open terminal in VS Code → run claude — works automatically
Continue config.json → "apiBase": "http://localhost:8765"
Cline Settings → API Base URL → http://localhost:8765
Cursor Own API key mode only → set base URL to http://localhost:8765
Any OpenAI-compatible tool OPENAI_BASE_URL=http://localhost:8765

3. Code normally. The optimizer runs silently on every request.

Watch the status bar update in real time: ⚡ 1,240 tkns saved


How much will you actually save?

Savings depend on how you work. Here's what to expect across common workflows:

Workflow Typical session Long session / agentic
Short single-turn queries 5–15% —
Multi-turn chat sessions 25–45% 45–55%
Code review with long context 30–50% 50–60%
Agentic sessions (tool calls) 35–55% 55–65%
Long debugging sessions 40–55% 55–65%
Average across all workflows ~40% ~60%

The more context your session accumulates, the more the optimizer saves. Short queries get modest savings. Long agentic sessions routinely hit 55–65%.


Real savings by model

Based on 100 requests/day at ~20,000 tokens/request (typical for agentic workflows — system prompts, file context, and conversation history):

Model Monthly spend 40% savings 60% savings
GPT-4o ($2.50/M) ~$110/mo $44 saved $66 saved
GPT-4.1 ($2.00/M) ~$88/mo $35 saved $53 saved
Claude Sonnet ($3.00/M) ~$132/mo $53 saved $79 saved
Claude Opus ($15.00/M) ~$660/mo $264 saved $396 saved

Pro ($10/mo) pays for itself in under a day on any model. For a 5-person team: $175–$396/month saved.


Dashboard

Click the ⚡ status bar item to open the savings dashboard:

  • Lifetime stats — total tokens saved, estimated cost saved, average compression ratio
  • Per-request history — every request with raw vs optimized tokens, cost delta
  • Account — sign in, manage billing, upgrade

Web dashboard: sign in at app.trimliai.com to see full analytics and 30-day charts.


Commands

Command What it does
Trimli AI: Show Dashboard Open the savings dashboard panel
Trimli AI: Toggle On/Off Pause or resume optimization
Trimli AI: Toggle Forward Proxy Auto-inject proxy into VS Code terminal sessions (Claude Code)
Trimli AI: Sign In Sign in to your Trimli AI account

Settings

Setting Default Description
tokOptimizer.enabled true Enable/disable optimization globally
tokOptimizer.tokenBudget 8000 Max input tokens before context pruning activates
tokOptimizer.aggressiveness 0.5 Compression aggressiveness: 0 = conservative, 1 = maximum

Tiers

Free Pro ($10/mo) Enterprise ($30/seat/mo)
Optimization Full Full Full + org shared pools
Average savings ~40% ~40% ~40–60%
Daily savings cap 200K tokens Unlimited Unlimited
Analytics VS Code dashboard Full web portal Org-level + audit logs
SSO (Okta / Azure AD) — — ✓
On-premise deployment — — ✓ Docker + Helm
Support Community Email Priority + SLA

No account required on the free tier. A licence key is created automatically when you install. Upgrade at app.trimliai.com.


FAQ

Does it store my prompts? No. The proxy optimizes in-flight and immediately discards the messages. Nothing is logged, cached, or sent anywhere except directly to the upstream AI API.

Will it change the quality of AI responses? No. Tested across 59 accuracy benchmarks — zero quality degradation detected.

Does it work with streaming? Yes. Streaming responses pass through unchanged. Only the input prompt is compressed.

Does it work offline? Yes. The proxy runs entirely on your machine.

Does it work with Claude Code? Yes — enable the forward proxy (Command Palette → Trimli AI: Toggle Forward Proxy), then launch Claude Code from the VS Code terminal.

What if I use multiple AI tools? Point all of them at http://localhost:8765. The proxy handles OpenAI and Anthropic formats simultaneously.


License

Business Source License 1.1 — free to use for individuals and teams. You may not offer a competing token optimization SaaS. Converts to Apache 2.0 on 2030-04-11.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft