PeakInfer for VS Code
Achieve peak inference performance—directly in your editor.
PeakInfer analyzes every LLM inference point in your code to find what's holding back your latency, throughput, and reliability.
The Problem
Your code says streaming: true. Runtime shows 0% actual streams. That's drift—and you can't see it until production.
Peak Inference Performance means: Improving latency, throughput, reliability, and cost without changing evaluated behavior.
Features
- Inline Diagnostics: See performance issues highlighted in your code
- Drift Detection: Find mismatches between code declarations and runtime behavior
- Results Panel: Comprehensive analysis view with actionable recommendations
- Benchmark Comparison: Compare to InferenceMAX benchmarks (15+ models)
- Multiple Languages: TypeScript, JavaScript, Python, Go, Rust
Installation
- Install from VS Code Marketplace: Search "PeakInfer"
- Or install from VSIX:
code --install-extension peakinfer-1.0.0.vsix
Setup
Get your PeakInfer token at peakinfer.com/dashboard — sign in with GitHub, generate token.
50 free credits included. No credit card.
Option 1: VS Code Settings (Recommended)
- Open Settings (Cmd+, / Ctrl+,)
- Search "PeakInfer"
- Enter your token in "PeakInfer Token"
Option 2: Environment Variable
Set PEAKINFER_TOKEN in your shell:
export PEAKINFER_TOKEN=pk_your-token-here
Usage
Analyze Current File
- Command Palette:
PeakInfer: Analyze Current File
- Keyboard: Cmd+Shift+P (Mac) / Ctrl+Shift+P (Windows/Linux)
- Right-click in editor: "PeakInfer: Analyze Current File"
Analyze Workspace
- Command Palette:
PeakInfer: Analyze Workspace
- Right-click folder in Explorer: "PeakInfer: Analyze Workspace"
View Results
- Command Palette:
PeakInfer: Show Results Panel
The Four Dimensions
PeakInfer analyzes every inference point across 4 dimensions:
| Dimension |
What We Find |
| Latency |
Missing streaming, blocking calls, p95 vs benchmark gaps |
| Throughput |
Sequential bottlenecks, batch opportunities |
| Reliability |
Missing retries, timeouts, fallbacks |
| Cost |
Right-sized model selection, token optimization |
Settings
| Setting |
Default |
Description |
peakinfer.token |
"" |
PeakInfer token (or use env var) |
peakinfer.analyzeOnSave |
false |
Auto-analyze on file save |
peakinfer.showInlineHints |
true |
Show inline hints for issues |
peakinfer.severityThreshold |
warning |
Minimum severity to show |
peakinfer.includeBenchmarks |
true |
Include benchmark comparisons |
peakinfer.excludePatterns |
["**/node_modules/**", ...] |
Patterns to exclude |
Commands
| Command |
Description |
PeakInfer: Analyze Current File |
Analyze the active file |
PeakInfer: Analyze Workspace |
Analyze entire workspace |
PeakInfer: Show Results Panel |
Open results panel |
PeakInfer: Clear Diagnostics |
Clear all diagnostics |
PeakInfer: Set Token |
Configure PeakInfer token |
Supported Providers
- OpenAI (GPT-4o, GPT-4, GPT-3.5, etc.)
- Anthropic (Claude)
- Azure OpenAI
- AWS Bedrock
- Google Vertex AI
- vLLM, TensorRT-LLM (HTTP detection)
- LangChain, LlamaIndex (framework detection)
Pricing
Same as GitHub Action:
- Free: 50 credits one-time (6-month expiry)
- Starter: $19 for 200 credits
- Growth: $49 for 600 credits
- Scale: $149 for 2,000 credits
Credits are shared across VS Code and GitHub Action.
View pricing →
Troubleshooting
No diagnostics appearing
- Check token is configured (Settings or env var)
- Ensure file contains LLM API calls
- Check Output panel for errors (View > Output > PeakInfer)
Analysis taking too long
Large files may take 10-30 seconds. Check Output panel for progress.
Insufficient credits
Check balance at peakinfer.com/dashboard
Links
License
Apache-2.0