LiteLLM Provider for GitHub Copilot Chat

Use 100+ LLMs in VS Code with GitHub Copilot Chat powered by LiteLLM.

Features

Access 100+ LLMs (OpenAI, Anthropic, Google, AWS, Azure, and more) through a unified API
Automatic provider selection with cheapest and fastest modes
Support for streaming, function calling, and vision models
Self-hosted or cloud-based deployment options

Requirements

VS Code 1.108.0 or higher
LiteLLM proxy running (self-hosted or cloud)
LiteLLM API key (if required by your setup)

Quick Start

Install the extension from the VS Code Marketplace
Open VS Code's chat interface
Click the model picker → "Manage Models..." → "LiteLLM"
Enter your LiteLLM base URL (e.g., http://localhost:4000)
Enter your API key (if required)
Select models to add

Configuration

Connection Settings

To update your base URL or API key:

Command Palette: Ctrl+Shift+P / Cmd+Shift+P → "Manage LiteLLM Provider"
Model Picker: Chat interface → Model picker → "Manage Models..." → "LiteLLM"

Credentials are stored securely in VS Code's secret storage.

Token Limits (Automatic)

The extension automatically reads token limits from your LiteLLM server's model info. You can configure fallback defaults in VS Code settings:

To access: Ctrl+, / Cmd+, → Search "litellm-vscode-chat"

Setting	Default	Description
`litellm-vscode-chat.defaultMaxOutputTokens`	`16000`	Max tokens per response (fallback)
`litellm-vscode-chat.defaultContextLength`	`128000`	Total context window (fallback)
`litellm-vscode-chat.defaultMaxInputTokens`	`null`	Max input tokens (auto-calculated if null)

Priority: LiteLLM model info → Workspace settings → Defaults

Custom Model Parameters (Optional)

Override default request parameters for specific models using the modelParameters setting. This is useful for models with specific requirements (like gpt-5 requiring temperature: 1) or to customize behavior per model.

To configure: Add to your settings.json:

{
  "litellm-vscode-chat.modelParameters": {
    "gpt-5": {
      "temperature": 1
    },
    "gpt-4": {
      "max_tokens": 8000,
      "temperature": 0.8,
      "top_p": 0.9
    },
    "claude-opus": {
      "max_tokens": 16000,
      "temperature": 0.5
    }
  }
}

Supported parameters:

max_tokens - Maximum tokens in response
temperature - Randomness (0.0-2.0)
top_p - Nucleus sampling (0.0-1.0)
frequency_penalty - Reduce repetition (-2.0 to 2.0)
presence_penalty - Encourage new topics (-2.0 to 2.0)
stop - Stop sequences (string or array)
And any parameter supported by your litellm and model provier back end

Prefix matching: Configuration keys use longest prefix matching. For example, "gpt-4" will match "gpt-4-turbo:openai", "gpt-4:azure", etc. More specific keys take precedence.

Parameter precedence: Runtime options > User config > Defaults

Prompt Caching (Anthropic Claude)

The extension supports prompt caching for models that advertise this capability (currently Anthropic Claude models). Prompt caching reduces costs and improves response times by caching the system prompt across requests.

To configure: Add to your settings.json:

{
  "litellm-vscode-chat.promptCaching.enabled": true
}

How it works:

Automatically detects prompt caching support from LiteLLM's /v1/model/info endpoint
Only affects models that explicitly support prompt caching (primarily Claude models)
Adds cache_control blocks to system messages when enabled
Disabled by default for models without support

Benefits:

Reduced API costs (cached tokens are cheaper)
Faster response times (cached content doesn't need reprocessing)
Transparent to the user (works automatically when supported)

Troubleshooting

Mock LiteLLM Server (Local)

For quick manual testing, you can run a tiny mock LiteLLM server that serves a static model list and canned chat replies.

node scripts/mock-litellm-server.js

Optional port override:

PORT=4001 node scripts/mock-litellm-server.js

Then set your base URL to http://localhost:4000 (or the port you chose).

Status Bar Indicator

The LiteLLM status bar indicator (bottom right corner) shows your connection status:

Icon	Status	Description
`⚠️ LiteLLM`	Not Configured	Click to set up your connection
`⟳ LiteLLM`	Loading	Fetching models from server
`✓ LiteLLM (N)`	Connected	Successfully connected with N models available
`✗ LiteLLM`	Error	Connection failed - click for diagnostics

Click the status bar indicator at any time to view detailed diagnostics.

Test Your Connection

After configuring the extension, verify your setup:

Command Palette: Ctrl+Shift+P / Cmd+Shift+P → "LiteLLM: Test Connection"
Or click "Test Connection" after saving configuration

This will:

Attempt to connect to your LiteLLM server
Show the number of models found
Display detailed error messages if connection fails
Update the status bar with results

Diagnostic Tools

View Diagnostics

Command Palette: Ctrl+Shift+P / Cmd+Shift+P → "LiteLLM: Show Diagnostics"
Or click the status bar indicator

Shows:

Current configuration (base URL, API key status)
Connection state and model count
Last check timestamp
Quick access to output channel

Output Channel

View detailed logs for debugging:

Open Output panel: Ctrl+Shift+U / Cmd+Shift+U
Select "LiteLLM" from the dropdown

The output channel logs:

Configuration changes
Model fetch attempts and results
Error messages with full details
Server response information

Common Issues

"No models appear in the model picker"

Check the status bar - it will show the actual state
Click "Test Connection" to verify your setup
Check the "LiteLLM" output channel for error details
Verify your LiteLLM server is running and accessible

"Server returned 0 models"

Your LiteLLM proxy is running but has no models configured
Check your LiteLLM proxy configuration (litellm_config.yaml)
Run litellm --config your_config.yaml to start the proxy with models

"Authentication failed"

Your server requires an API key
Run "Manage LiteLLM Provider" and enter your API key
Verify the key is correct in your LiteLLM proxy configuration

"Connection Error: Unable to connect"

Verify the base URL is correct (e.g., http://localhost:4000)
Ensure your LiteLLM proxy is running
Check firewall/network settings

Development

git clone https://github.com/Vivswan/litellm-vscode-chat
cd litellm-vscode-chat
npm install
npm run compile

Press F5 to launch the Extension Development Host.

Command	Description
`npm run compile`	Build
`npm run watch`	Watch mode
`npm run lint`	Lint
`npm run format`	Format
`npm test`	Run tests

LiteLLM Provider for GitHub Copilot Chat

Vivswan

LiteLLM Provider for GitHub Copilot Chat

Features

Requirements

Quick Start

Configuration

Connection Settings

Token Limits (Automatic)

Custom Model Parameters (Optional)

Prompt Caching (Anthropic Claude)

Troubleshooting

Mock LiteLLM Server (Local)

Status Bar Indicator

Test Your Connection

Diagnostic Tools

Common Issues

Development

Resources