Copilot Proxy
AboutCopilot Proxy is a VS Code extension that exposes GitHub Copilot's language models through a local OpenAI-compatible API server. This lets you leverage your existing Copilot subscription to power external applications, scripts, and tools - no additional API costs, just your Copilot subscription. Perfect for developers who want to use Copilot's models in custom workflows, automation scripts, or with tools that expect an OpenAI-compatible endpoint.
Features
Prerequisites
InstallationManual Install - Preferred
From Source - debugging/launching may not work
UsageStarting the ServerThe server starts automatically by default. You can also:
Status BarThe status bar shows the current server state:
Click the status bar item to open the interactive status panel. Status PanelThe status panel provides:
Output LoggingView real-time logs in VS Code's Output panel (select "Copilot Proxy" from the dropdown):
Example output:
Using with External ToolsExample ScriptsTwo Python examples are included in the Simple Example (
|
| Environment Variable | Default | Description |
|---|---|---|
VSCODE_LLM_ENDPOINT |
http://127.0.0.1:8080/v1/chat/completions |
Proxy endpoint URL |
VSCODE_LLM_FALLBACK |
true |
Enable/disable Anthropic fallback |
ANTHROPIC_API_KEY |
(none) | Required for fallback support |
With Python (OpenAI client)
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:8080/v1",
api_key="not-needed" # Any value works
)
response = client.chat.completions.create(
model="claude-3.5-sonnet",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
With Python (streaming)
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:8080/v1",
api_key="not-needed"
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a short poem"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
With curl (streaming)
curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3.5-sonnet",
"messages": [{"role": "user", "content": "Write a haiku"}],
"stream": true
}'
With Node.js
const response = await fetch('http://127.0.0.1:8080/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Hello!' }]
})
});
const data = await response.json();
console.log(data.choices[0].message.content);
With LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://127.0.0.1:8080/v1",
api_key="not-needed",
model="claude-3.5-sonnet"
)
response = llm.invoke("What is the capital of France?")
print(response.content)
API Endpoints
Once running, the following endpoints are available:
POST /v1/chat/completions
OpenAI-compatible chat completions endpoint.
curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3.5-sonnet",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'
Request Body:
model(optional): Model ID or partial name to match. If omitted, uses default model setting or first available.messages: Array of chat messages withrole(system,user,assistant) andcontentstream(optional): Set totruefor streaming responses (SSE format)temperature(optional): Accepted but not forwarded to VS Code APImax_tokens(optional): Accepted but not forwarded to VS Code API
Response (non-streaming):
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "copilot-claude-3.5-sonnet",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}
Response (streaming):
Server-Sent Events (SSE) format compatible with OpenAI's streaming API.
GET /v1/models
List available models.
curl http://127.0.0.1:8080/v1/models
Response:
{
"object": "list",
"data": [
{
"id": "copilot-claude-3.5-sonnet",
"object": "model",
"created": 1234567890,
"owned_by": "copilot",
"name": "Claude 3.5 Sonnet",
"family": "claude-3.5-sonnet",
"version": "1.0",
"maxInputTokens": 16384
}
]
}
GET /health
Health check endpoint.
curl http://127.0.0.1:8080/health
Response:
{
"status": "ok",
"models_available": 5
}
Configuration
Settings available in VS Code Settings (search for "Copilot Proxy"):
| Setting | Default | Description |
|---|---|---|
copilotProxy.port |
8080 |
Port number for the proxy server |
copilotProxy.autoStart |
true |
Automatically start when VS Code opens |
copilotProxy.defaultModel |
"" |
Default model when not specified in request (leave empty for first available) |
Commands
Copilot Proxy: Start Server- Start the proxy serverCopilot Proxy: Stop Server- Stop the proxy serverCopilot Proxy: Show Status- Open the interactive status panel
Limitations
- System Messages: VS Code LM API doesn't have a system role - system messages are converted to user messages
- Token Counts: Token counts in responses are always 0 (VS Code API doesn't expose this)
- Temperature/Max Tokens: These parameters are accepted but not forwarded to the underlying API
- Request Size: Maximum request body size is 10MB (requests larger than this will receive a 413 error)
- Request Timeout: Requests timeout after 30 seconds (will receive a 408 error)
Security
Copilot Proxy is designed for local development use. The following security considerations apply:
Localhost-Only Binding
The server binds to 127.0.0.1 (localhost) by default. This means:
- Only applications on your local machine can access the proxy
- The server is not accessible from other devices on your network
- This is intentional to prevent unauthorized access
No Authentication
The API does not require authentication because:
- It's designed for trusted local applications only
- Your Copilot subscription credentials are managed securely by VS Code
- Adding authentication would add friction without meaningful security benefit in a localhost context
CORS Configuration
The server allows all origins (Access-Control-Allow-Origin: *) because:
- Browser-based local development tools need CORS headers
- Localhost binding already limits access to local applications
- Restrictive CORS would break integration with local web tools
Request Limits
The following limits protect against resource exhaustion:
| Limit | Value | Purpose |
|---|---|---|
| Request body size | 10 MB | Prevents memory exhaustion |
| Request timeout | 30 seconds | Prevents connection exhaustion |
| Keep-alive timeout | 5 seconds | Manages idle connections |
Best Practices
- Do not expose the proxy to the network (don't modify binding to
0.0.0.0) - Do not run in production environments
- The proxy is for development and testing only
Troubleshooting
"No language models available"
- Ensure GitHub Copilot extension is installed
- Ensure you're signed into GitHub with Copilot access
- Try running
GitHub Copilot: Sign Infrom Command Palette - Check the Output panel for error details
"Port already in use"
- Change the port in settings (
copilotProxy.port) - Or stop whatever is using that port
Model not found
- Use
GET /v1/modelsto see available models - Model matching is flexible:
claude,sonnet, orclaude-3.5-sonnetall work - Check the Output panel to see which model was selected
Check the Logs
Open VS Code's Output panel and select "Copilot Proxy" from the dropdown to see detailed logs including:
- All errors with timestamps
- Request/response details
- Model selection information
License
MIT