Matrix OAI Gateway for Copilot

Matrix OAI Gateway turns VS Code into a small OpenAI-compatible AI gateway.

It has two directions:

OAI Provider: expose OpenAI-compatible upstream models inside VS Code Chat and GitHub Copilot Chat.
Local Proxy: expose VS Code language models through local OpenAI-compatible and Anthropic-compatible HTTP APIs.

中文：Matrix OAI Gateway 可以把 OpenAI 兼容模型接入 VS Code / Copilot Chat，同时把 VS Code 里的语言模型反向暴露成本地 OpenAI / Anthropic 兼容接口。

Highlights

Provider-based model config: define one provider, attach multiple models.
DeepSeek thinking-mode replay: preserves reasoning_content internally for tool-calling conversations.
Local proxy endpoints for OpenAI Chat Completions, Anthropic Messages, and model listing.
Status bar telemetry with proxy port, latest context usage, requests, and errors.
Configuration webview with providers, models, endpoints, usage, latency, and context details.
Secret handling: API keys can be stored in VS Code Secret Storage; header display is redacted.

中文功能：

支持 provider/model 分离配置。
支持 DeepSeek 思考模式工具调用所需的 reasoning_content 内部回放。
支持本地 /v1/chat/completions、/v1/messages、/v1/models 代理接口。
底部状态栏显示端口、最近上下文用量、请求数和错误数。
配置页显示模型、接口、用量、延迟、上下文等信息。
API Key 可存入 VS Code Secret Storage，配置页会隐藏敏感 header。

Providers And Models

Providers are reusable endpoints. Models reference providers by providerId, so one provider can host many models.

"matrixOaiCopilot.providers": [
  {
    "id": "ollama",
    "name": "Ollama Local",
    "baseUrl": "http://localhost:11434/v1",
    "apiMode": "ollama",
    "headers": {}
  },
  {
    "id": "deepseek",
    "name": "DeepSeek",
    "baseUrl": "https://api.deepseek.com/v1",
    "apiMode": "openai",
    "headers": {}
  }
],
"matrixOaiCopilot.models": [
  {
    "id": "qwen3.5:9b",
    "name": "Qwen3.5 9B",
    "providerId": "ollama",
    "family": "qwen",
    "maxInputTokens": 131072,
    "max_tokens": 32768,
    "supportsTools": true,
    "supportsImages": true,
    "temperature": 0.2,
    "top_p": 0.95,
    "top_k": 20,
    "enable_thinking": true,
    "thinking_budget": 8192
  },
  {
    "id": "deepseek-chat",
    "name": "DeepSeek Chat",
    "providerId": "deepseek",
    "family": "deepseek",
    "maxInputTokens": 64000,
    "supportsTools": true,
    "supportsImages": false
  }
]

Legacy model-level baseUrl still works, but new configs should use providers.

DeepSeek Thinking Mode

Some DeepSeek-compatible thinking models require the assistant reasoning_content to be passed back in later tool-calling turns. VS Code does not expose that hidden field as visible text, so this extension stores it in memory and reattaches it to matching assistant messages before calling the upstream API.

中文：DeepSeek 思考模式在工具调用多轮对话里可能要求把上一轮 assistant 的 reasoning_content 回传。本扩展会把这个隐藏字段保存在内存里，并在后续请求里自动补回，避免 invalid_request_error。

Model options:

thinkingFormat: auto, deepseek, always, or none.
reasoningContentFallback: send empty reasoning_content for assistant tool-call turns when the exact hidden reasoning text cannot be recovered.

Copilot Explore Subagents

GitHub Copilot Agent can start an internal Explore subagent for code search and file review. VS Code exposes a setting for that model, and this extension contributes defaults so Explore keeps using the OAI model instead of falling back to a Copilot built-in model:

"chat.exploreAgent.defaultModel": "DeepSeek V4 Pro (matrix-oai-compatible)",
"chat.customAgentInSubagent.enabled": true

中文：Copilot Agent 在搜索和阅读代码时会启动内置 Explore 子代理。本扩展会默认把 Explore 子代理模型设置为 DeepSeek V4 Pro，避免子任务自动切回 GPT-4.1。

Local Proxy

When the proxy is running:

OpenAI Chat Completions: http://127.0.0.1:8080/v1/chat/completions
Anthropic Messages: http://127.0.0.1:8080/v1/messages
Models: http://127.0.0.1:8080/v1/models
Health: http://127.0.0.1:8080/health

If the requested model matches a configured OAI model, the proxy routes to that upstream provider. Otherwise it tries to route to an available VS Code language model, such as Copilot models.

中文：如果请求里的 model 命中已配置的 OAI 模型，会转发到对应上游；否则会尝试匹配 VS Code 中可用的语言模型。

Commands

Matrix OAI Gateway: Configuration
Matrix OAI Gateway: Add Provider
Matrix OAI Gateway: Add Preset Model
Matrix OAI Gateway: Add Model
Matrix OAI Gateway: Set API Key
Matrix OAI Gateway: Clear API Key
Matrix OAI Gateway: Refresh Models
Matrix OAI Gateway: Start Proxy
Matrix OAI Gateway: Stop Proxy
Matrix OAI Gateway: Restart Proxy
Matrix OAI Gateway: Show Output
Matrix OAI Gateway: Open Settings
Matrix OAI Gateway: Reset Usage

Usage And Logs

The status bar shows:

proxy state and port
latest request context usage
total request count
total error count

The configuration webview shows request count, errors, reported or estimated input/output tokens, context usage, and average latency per model.

Logs go to the Matrix OAI Gateway Output channel. Set matrixOaiCopilot.logLevel to off, error, info, or debug.

中文：底部状态栏会显示代理端口、最近上下文占用、请求数和错误数。配置页会显示模型级用量、错误、上下文、延迟等细节。

Timeouts And Streaming

Slow thinking models can take longer than normal chat models. The global upstream timeout is controlled by matrixOaiCopilot.requestTimeoutSeconds; an individual model can override it with requestTimeoutSeconds.

Set model-level stream: false when a provider is more stable with JSON responses than SSE streaming.

中文：慢思考模型可能超过普通聊天模型的等待时间。可以用 matrixOaiCopilot.requestTimeoutSeconds 设置全局超时，也可以在单个模型里用 requestTimeoutSeconds 覆盖。若某个供应商非流式更稳定，可以在模型里设置 stream: false。

Compatibility

This extension supports:

upstream models reachable through OpenAI-compatible /chat/completions
VS Code language models available through vscode.lm
OpenAI-compatible proxy clients
basic Anthropic Messages clients

Tool calling, image input, reasoning options, and token usage depend on the upstream model and gateway.

中文：工具调用、图片输入、思考模式参数和真实 token 用量取决于具体上游模型和网关。

Matrix OAI Gateway for Copilot

matrix-oai-compatible-copilot-provider

Matrix OAI Gateway for Copilot

Highlights

Providers And Models

DeepSeek Thinking Mode

Copilot Explore Subagents

Local Proxy

Commands

Usage And Logs

Timeouts And Streaming

Compatibility