LLM Token Counter Summarizer DevBoy.pro

A powerful VS Code extension by DevBoy.pro that counts and summarizes LLM tokens in your project files. Token counts appear as small badges next to file names in the explorer, helping developers understand their code's token footprint for AI models.

Features

🚀 Core Features

Multiple Encoding Support: Supports various OpenAI encodings (cl100k_base, o200k_base, p50k_base, r50k_base)
Model-Specific Tokenization: Choose encoding based on your target model (GPT-4, GPT-4o, GPT-3.5, etc.)
Universal File Processing: Processes all files under 2MB regardless of file extension
Smart Filtering: Respects .gitignore patterns when scanning files
Performance Optimized: Caches token counts based on file content (SHA-256 hash)
Real-time Updates: Automatically recounts when files change
Progress Tracking: Shows counting progress in the status bar
Folder Summaries: Displays total tokens for folders with real-time updates
Persistent Cache: Saves cache periodically to .vscode/token-cache.txt

🌍 Internationalization

Full support for English and Russian languages
Automatically uses VS Code's language setting

🔧 Technology

Pure JavaScript Implementation: Uses js-tiktoken for fast, reliable tokenization
No Native Dependencies: Fully bundled extension without platform-specific binaries
WebAssembly-Free: Compatible with all VS Code environments including web-based versions

Badge Notation

Due to VS Code's 2-character limit for file decoration badges, token counts are displayed using a compact notation:

Token Range	Badge	Example
0	`0`	0 tokens
1-999	`.0` to `.9`	`.1` = ~100 tokens, `.5` = ~500 tokens
1,000-99,999	`1` to `99`	`2` = ~2,000 tokens, `15` = ~15,000 tokens
100,000-999,999	`^1` to `^9`	`^2` = ~200,000 tokens, `^5` = ~500,000 tokens
1,000,000-9,999,999	`1` to `9`	`1` = ~1 million tokens, `3` = ~3 million tokens
10,000,000-99,999,999	`1∞` to `9∞`	`1∞` = ~10 million tokens, `5∞` = ~50 million tokens
100,000,000+	`∞∞`	More than 100 million tokens

Special badges:

• - File is being processed
⚠ - Error occurred during token counting
∞ - File is too large to process (>2MB)

Extension Settings

tokenCounter.encoding: Choose the encoding algorithm for token counting:
- cl100k_base (default) - Used by GPT-4, GPT-3.5-turbo, text-embedding-ada-002
- o200k_base - Used by GPT-4o models
- p50k_base - Used by text-davinci-003, text-davinci-002, text-davinci-001
- r50k_base - Used by GPT-3 davinci, curie, babbage, ada models

Note: Different models use different encoding algorithms. Choose the encoding that matches your target model for accurate token counts.

LLM Token Counter Summarizer DevBoy.pro

Andrei Mazniak

LLM Token Counter Summarizer DevBoy.pro

Features

🚀 Core Features

🌍 Internationalization

🔧 Technology

Badge Notation

Extension Settings