idoapico - OpenAI Compatible Models for VS Code
A VS Code extension that integrates OpenAI-compatible models (including Kimi K2, Claude, Qwen3 Coder, Qwen3 next, Minimax M2 and custom LLM endpoints) into VS Code's Language Model API. Use any OpenAI API-compatible endpoint directly within VS Code's chat interface.
Features
- Multi-Provider Support: Seamlessly switch between OpenAI, Azure OpenAI, OpenRouter, Kimi/Moonshot, and any OpenAI-compatible endpoint
- Custom Model Configuration: Add unlimited models with customizable settings (temperature, top_p, max tokens, etc.)
- Tool Calling: Full support for function calling and tool integration
- Vision Support: Process images with vision-capable models
- Advanced Reasoning: Configure reasoning budgets and thinking modes for models that support it
- Secure API Key Management: Store API keys securely in VS Code's Secret Storage
- Circuit Breaker Protection: Automatic endpoint failure detection with 10-second cooldown
- Proxy Support: Configure HTTP/HTTPS proxies with authentication
- Retry Mechanism: Configurable automatic retry with exponential backoff
- Token Counting: Smart token calculation supporting Chinese characters and images
- Token Usage Display: Shows per-request token usage in the status bar (
Tokens: total (p:prompt c:completion)) and supports optional session accumulation (configurable via settings). Use idoapico.resetTokenSession to reset the current session counter.
- Health Checks: Monitor endpoint availability with built-in health check command
Requirements
- VS Code 1.106.1 or higher
- An OpenAI-compatible API endpoint or API key (OpenAI, Azure OpenAI, etc.)
Extension Settings
This extension contributes the following configuration options:
Core Settings
idoapico.models (array)
Configure OpenAI-compatible models. Each model requires:
id - Unique model identifier (e.g., "gpt-4o", "claude-3-5-sonnet")
owned_by - Provider name used for API key lookup (e.g., "openai", "anthropic")
baseUrl - API endpoint URL (e.g., "https://api.openai.com/v1")
Optional fields:
configId - Unique suffix for same model with different settings
displayName - Display name in model picker
context_length - Max context in tokens (default: 128000)
max_completion_tokens - Max output tokens (default: 4096)
max_tokens - Alternative name for max_completion_tokens (legacy)
vision - Boolean, whether model supports image input (default: false)
headers - Custom HTTP headers as key-value pairs
extra - Additional fields to send with API requests
parser - Override parser: "openai", "kimi", "anthropic", "generic" (auto-detected by default)
request_delay - Delay in milliseconds before each request to this specific model (overrides global delay setting)
idoapico.retry
Configure automatic retry behavior:
enabled - Enable retry on failures (default: true)
max_attempts - Number of retry attempts (default: 3)
interval_ms - Delay between retries in milliseconds (default: 1000)
idoapico.timeout
Request timeout in milliseconds (default: 30000, minimum: 1000)
idoapico.delay
Global artificial delay between requests in milliseconds (default: 0, minimum: 0)
Priority Cascade: You can configure delays at two levels:
- Global Level:
idoapico.delay applies to all models
- Model Level:
request_delay in individual model config overrides global delay
Priority: Model-specific delay takes precedence over global delay. If no delays are set, no delay is applied.
idoapico.proxy
Configure HTTP proxy:
url - Proxy URL (e.g., "http://proxy.example.com:8080")
username - Proxy username (optional)
password - Proxy password (optional)
idoapico.debug
Enable debug logging to Output > idoapico (default: false)
Sampling Parameters
Available for models that support them:
temperature - Sampling temperature (0-2)
top_p - Top-p sampling (0-1)
top_k - Top-k sampling
min_p - Minimum probability threshold
frequency_penalty - Penalize frequent tokens (-2 to 2)
presence_penalty - Penalize repeated tokens (-2 to 2)
repetition_penalty - Alternative repetition penalty
Advanced Features
Reasoning Configuration
"reasoning": {
"effort": "high|medium|low|minimal|auto",
"exclude": false,
"max_tokens": 10000,
"enabled": true
}
Thinking/Internal Monologue
"thinking": {
"type": "enabled|disabled"
},
"enable_thinking": true,
"thinking_budget": 10000
Commands
idoapico: Set Generic API Key - Set API key for all providers
idoapico: Set Provider API Key - Set API key for specific provider
idoapico: Check Endpoint Health - Verify endpoint connectivity
idoapico: Refresh Models - Reload model configuration
Quick Start
Open VS Code settings and add your models to idoapico.models:
{
"idoapico.models": [
{
"id": "gpt-4o",
"owned_by": "openai",
"displayName": "GPT-4o",
"baseUrl": "https://api.openai.com/v1",
"context_length": 128000,
"max_completion_tokens": 4096,
"vision": true
},
{
"id": "claude-3-5-sonnet",
"owned_by": "anthropic",
"displayName": "Claude 3.5 Sonnet",
"baseUrl": "https://api.anthropic.com/v1",
"parser": "anthropic",
"context_length": 200000
},
{
"id": "moonshot-v1-8k",
"owned_by": "kimi",
"displayName": "Kimi (Moonshot)",
"baseUrl": "https://api.moonshot.cn/v1",
"parser": "kimi",
"context_length": 8000,
"request_delay": 2000
}
]
}
Delay Configuration Examples
Global delay for all models:
{
"idoapico.delay": 1000
}
Model-specific delay (overrides global):
{
"idoapico.models": [
{
"id": "expensive-api",
"owned_by": "custom",
"displayName": "Expensive API Model",
"baseUrl": "https://expensive-api.example.com/v1",
"request_delay": 5000
}
]
}
Combined global + model-specific:
{
"idoapico.delay": 1000,
"idoapico.models": [
{
"id": "rate-limited",
"owned_by": "limited",
"baseUrl": "https://limited-api.example.com/v1",
"request_delay": 3000
}
]
}
2. Set API Keys
Use the command palette to set API keys:
idoapico: Set Generic API Key - Sets a fallback API key for all providers
idoapico: Set Provider API Key - Sets a provider-specific API key (recommended for multiple providers)
Keys are stored securely in VS Code's Secret Storage.
3. Start Using Models
Open VS Code's Chat interface and select an idoapico model from the model picker dropdown.
Architecture
Parser System
The extension automatically selects the appropriate parser based on model configuration:
- OpenAI Parser - For OpenAI, Azure OpenAI, OpenRouter, and standard OpenAI-compatible endpoints
- Kimi Parser - Specialized support for Kimi/Moonshot models with custom token handling
- Generic Parser - Fallback for unknown providers
Custom parsers handle:
- Stream parsing (Server-Sent Events)
- Tool call accumulation
- Reasoning content extraction
- Token buffering for split streaming responses
Request Lifecycle
- Model configuration validation
- Circuit breaker check (prevents requests to failing endpoints)
- API key retrieval (provider-specific → model-specific → generic)
- Parser selection based on model family
- Request body preparation with sampling parameters
- Streaming response handling with progress reporting
- Tool call accumulation and normalization
- Error recovery with retry logic
Token Counting
Smart token estimation for multi-language support:
- Chinese characters: ~1.5 tokens each
- Other characters: ~0.25 tokens each
- Images: Fixed 170 tokens per image
Known Issues & Limitations
- Tool calls are limited to 100 per request to prevent VS Code performance issues
- Circuit breaker temporarily blocks failing endpoints for 10 seconds
- Custom thinking/reasoning output formatting may vary by provider
- Some providers may not support all configuration parameters
Troubleshooting
Models not appearing in VS Code Chat
- Ensure VS Code version is 1.106.1 or higher
- Check that models have valid
id, owned_by, and baseUrl in settings
- Verify API keys are set for the provider
- Check Output > idoapico for error messages
"Connection Error" messages
- Verify
baseUrl is correct and includes /v1 for OpenAI endpoints
- Check API key is valid and has required permissions
- Verify network connectivity (check proxy settings if applicable)
- Use
idoapico: Check Endpoint Health command
Timeouts
- Increase
idoapico.timeout setting (default: 30000ms)
- Check network latency to the endpoint
- Verify endpoint is responsive with health check command
- Ensure model supports function calling
- Check tool schema is valid JSON
- Verify tool names don't contain invalid characters (automatically sanitized)
- Review VS Code output for parsing errors
Development
Build
npm run compile # Full build (type check + lint + bundle)
npm run watch # Watch mode for development
npm run package # Production minified build
Testing
npm test # Run tests
npm run watch-tests # Watch mode for tests
Project Structure
src/extension.ts - Entry point and activation
src/provider.ts - Core LanguageModelChatProvider implementation
src/parsers/ - Parser implementations (OpenAI, Kimi, etc.)
src/managers/ - Configuration, secrets, and health management
src/utils/ - Utilities (token counting, circuit breaker, retry logic)
Contributing
Contributions are welcome! Please ensure:
- Code passes TypeScript strict mode
- ESLint checks pass (
npm run lint)
- Tests pass (
npm test)
- Changes are documented
License
See LICENSE file for details.
Support
- Issues: Report bugs and feature requests on GitHub
- Debug Output: Enable
idoapico.debug and check Output > idoapico channel
- API Compatibility: Ensure your endpoint is OpenAI-compatible