TokenTrim

Trim tokens. Cut costs. Code smarter.

Stop overpaying for LLM APIs. TokenTrim optimizes your prompts locally right in VS Code with dual optimization modes: instant rule-based compression or advanced ML-powered optimization.

✨ Features

Core Features

📊 Token Counting: Real-time token count using tiktoken (OpenAI), approximations for Claude/Gemini
💰 Cost Estimation: Accurate cost calculation across 9+ LLM models
✂️ Dual Optimization Modes:
- Quick Compress (Rule-Based): 30-50% reduction, instant
- ML Optimize (LLMLingua): 50-80% reduction, 2-3 seconds
🌐 Multi-Provider Support: OpenAI, Anthropic Claude, Google Gemini
🔒 Privacy-First: 100% local processing, no data sent to external servers

Advanced Features (v2.0)

🤖 LLMLingua Integration: State-of-the-art ML compression
📊 Multi-Model Comparison: Compare costs across all supported models
🔄 Intelligent Fallbacks: Automatic fallback to rule-based if ML unavailable
⏱️ Progress Indicators: Real-time feedback during ML compression

🚀 Quick Start

Select text in any editor
Open the TokenTrim sidebar (or check the status bar for token count)
Choose your optimization mode:
- Click "Quick Compress" for instant results
- Click "ML Optimize" for maximum compression
Review the before/after comparison
Click "Replace Selection" to save tokens and money

⌨️ Keyboard Shortcuts

Ctrl+Shift+O (Windows/Linux) or Cmd+Shift+O (Mac): Optimize with rules
Ctrl+Shift+M (Windows/Linux) or Cmd+Shift+M (Mac): Optimize with ML

📋 Commands

TokenTrim: Optimize Selected Text - Rule-based optimization
TokenTrim: Optimize with ML (LLMLingua) - ML-powered compression
TokenTrim: Show Sidebar
TokenTrim: Select Model

Supported Models

18 models across 3 providers:

OpenAI (5 models):

GPT-4 Turbo
GPT-4
GPT-4 32K
GPT-3.5 Turbo
GPT-3.5 Turbo 16K

Anthropic Claude (7 models):

Claude Opus 4.5
Claude Sonnet 4.5
Claude 3.5 Sonnet
Claude 3.5 Haiku
Claude 3 Opus
Claude 3 Sonnet
Claude 3 Haiku

Google Gemini (6 models):

Gemini 3 Pro
Gemini 3 Flash
Gemini 2.0 Flash
Gemini 1.5 Pro
Gemini 1.5 Flash
Gemini 1.5 Flash-8B

Note: Claude and Gemini token counts are approximations (±5% accuracy).

🤖 LLMLingua Setup (Optional)

For advanced ML compression, install LLMLingua:

Prerequisites

Python 3.8 or higher
pip (Python package manager)

Installation

The extension will prompt you to install LLMLingua on first use
Click "Install" to automatically install dependencies
Wait 2-3 minutes for model download (~500MB)

Manual Installation

pip install llmlingua torch transformers

Troubleshooting

Python not found: Install from python.org
Installation fails: Try pip install --upgrade pip first
Model download slow: Use a VPN if in restricted regions

Note: LLMLingua is optional. Rule-based optimization works without it.

🛠️ How It Works

Rule-Based Optimization (Quick Compress)

✂️ Whitespace normalization - Remove excessive spacing
📝 Markdown stripping - Clean unnecessary formatting
🎯 Phrase condensing - Replace verbose phrases with concise alternatives
🧹 Filler removal - Strip words that don't add value
⚡ Punctuation optimization - Simplify without losing meaning

ML Optimization (LLMLingua)

🧠 Semantic compression - Preserves meaning while removing redundancy
📉 Token-level optimization - Intelligent token selection
🎯 Context-aware - Understands prompt structure
⚡ 50-80% reduction - Superior compression ratios

All transformations preserve semantic meaning.

📦 Requirements

VS Code 1.85.0 or higher
Python 3.8+ (optional, for ML optimization)

📄 License

MIT - Free forever, open source

Made with ❤️ for developers who care about LLM costs

TokenTrim

iRubeel

TokenTrim

✨ Features

Core Features

Advanced Features (v2.0)

🚀 Quick Start

⌨️ Keyboard Shortcuts

📋 Commands

Supported Models

🤖 LLMLingua Setup (Optional)

Prerequisites

Installation

Manual Installation

Troubleshooting

🛠️ How It Works

Rule-Based Optimization (Quick Compress)

ML Optimization (LLMLingua)

📦 Requirements

📄 License