CLLMs for Copilot Chat
中文
Thanks
CLLMs began as a Qwen-focused adaptation of Vizards/deepseek-v4-for-copilot by Vizards, which pioneered the approach of plugging a BYOK model into the Copilot Chat picker via the native LanguageModelChatProvider API, and has since grown into a multi-provider extension for Qwen, z.ai (GLM), MiniMax, Xiaomi MiMo, Moonshot Kimi, and Tencent Hunyuan. Huge thanks to the original author — the provider pipeline, vision proxy, thinking-mode handling, and diagnostics here are deeply inspired by and built upon the generous foundation that Vizards created and shared with the community.
Getting Started
Prerequisites
- VS Code 1.116 or later. This extension relies on non-public Copilot Chat APIs that may break on newer VS Code versions — report an issue if you hit one.
- GitHub Copilot subscription (Free / Pro / Enterprise — the free tier works)
- An API key for at least one provider:
Installation
Install from the registry used by your editor:
- Microsoft VS Code — install from VS Code Marketplace.
Usage
- Run CLLMs: Set API Key from the Command Palette (
Cmd+Shift+P) and pick a provider
- Paste that provider's API key or compatible token (Qwen DashScope keys usually start with
sk-)
- Open Copilot Chat, click the model picker, pick a model
- That's it — chat away
Models
Six providers ship out of the box. Each model carries its own API key and endpoint, so you can use Qwen, z.ai (GLM), MiniMax, Xiaomi MiMo, Moonshot Kimi, Tencent Hunyuan, or any combination at the same time from the Copilot model picker.
Qwen (DashScope)
| Model |
Best For |
| Qwen3 Coder Plus |
Agentic coding, tool calls, large refactors |
| Qwen Plus |
Balanced everyday use with hybrid thinking |
| Qwen3 Max |
Flagship model for hard tasks |
| Qwen3-VL Plus |
Native vision (image input) |
z.ai (Zhipu GLM)
| Model |
Best For |
| GLM-4.6 |
Flagship coding & agents, 200K context |
| GLM-4.5-Air |
Lightweight, faster, lower cost |
| GLM-4.5V |
Native vision (image input) |
MiniMax
| Model |
Best For |
| MiniMax-M3 |
Flagship agentic & coding, native vision, up to 1M context |
| MiniMax-M2.7 |
Fast coding & agents, lower cost |
Xiaomi MiMo
| Model |
Best For |
| MiMo V2.5 Pro |
Flagship hybrid reasoning & coding, up to 1M context |
| MiMo V2.5 (Omni) |
Native vision (image input) plus thinking |
| MiMo V2 Flash |
Fast, low-cost everyday tasks |
Moonshot (Kimi)
| Model |
Best For |
| Kimi K2.6 |
Flagship native-multimodal agents & coding, 256K context |
| Kimi K2.5 |
Multimodal default with toggleable thinking |
Tencent Hunyuan (混元)
| Model |
Best For |
| Tencent HY 2.0 Think |
Flagship deep-thinking & coding, 128K context |
| Hunyuan TurboS |
Fast & balanced everyday |
| Hunyuan T1 |
Deep thinking, affordable |
| Hunyuan A13B |
Lightweight, fastest & lowest cost |
Model IDs are the official provider names and are fully configurable via cllms.modelIdOverrides / cllms.zai.modelIdOverrides / cllms.minimax.modelIdOverrides / cllms.xiaomi.modelIdOverrides / cllms.moonshot.modelIdOverrides / cllms.hunyuan.modelIdOverrides for third-party / self-hosted endpoints.
Adding a new model
Want to add your own model? See Adding a new model for a step-by-step guide.
Testing Status
| Provider |
Status |
Notes |
| Qwen (DashScope 国内) |
✅ Tested |
Qwen3 Coder Plus, Qwen Plus, Qwen3 Max, Qwen3-VL Plus — all verified. |
| Qwen (DashScope International) |
⚠️ Untested |
API compatibility should match the domestic endpoint. Test tokens or test reports welcome! |
| z.ai (Zhipu GLM) |
✅ Tested |
GLM-4.6, GLM-4.5-Air, GLM-4.5V — all verified. |
| MiniMax (国内) |
✅ Tested |
MiniMax-M3, MiniMax-M2.7 — all verified. |
| MiniMax (International) |
⚠️ Untested |
API compatibility should match the domestic endpoint. Test tokens or test reports welcome! |
| Xiaomi MiMo |
✅ Tested |
MiMo V2.5 Pro, MiMo V2.5 (Omni), MiMo V2 Flash — all verified. |
| Moonshot (Kimi 国内) |
✅ Tested |
Kimi K2.6, Kimi K2.5 — all verified. |
| Moonshot (Kimi International) |
⚠️ Untested |
API compatibility should match the domestic endpoint. Test tokens or test reports welcome! |
| Tencent Hunyuan (混元) |
✅ Tested |
Standard OpenAI-compatible API — all verified. |
💡 Help wanted! International endpoints share the same API surface as their domestic counterparts, so they should work out of the box — but they haven't been verified yet. If you have an international API key, please give it a try and report your results. If you'd like to contribute test tokens, reach out via GitHub Issues. Every bit of testing helps make these providers more reliable for everyone.
Settings
| Setting |
Default |
Description |
cllms.baseUrl |
https://dashscope-intl.aliyuncs.com/compatible-mode/v1 |
Qwen OpenAI-compatible endpoint. Use https://dashscope.aliyuncs.com/compatible-mode/v1 (Beijing), https://dashscope-us.aliyuncs.com/compatible-mode/v1 (US), or any compatible third-party / self-hosted endpoint |
cllms.zai.baseUrl |
https://api.z.ai/api/paas/v4 |
z.ai (GLM) OpenAI-compatible endpoint. Use https://api.z.ai/api/coding/paas/v4 for a GLM Coding Plan subscription |
cllms.minimax.baseUrl |
https://api.minimax.io/v1 |
MiniMax OpenAI-compatible endpoint. Use https://api.minimaxi.com/v1 for mainland China |
cllms.xiaomi.baseUrl |
https://api.xiaomimimo.com/v1 |
Xiaomi MiMo OpenAI-compatible endpoint (official open platform) |
cllms.moonshot.baseUrl |
https://api.moonshot.ai/v1 |
Moonshot (Kimi) OpenAI-compatible endpoint. Use https://api.moonshot.cn/v1 for mainland China (keys are region-specific) |
cllms.hunyuan.baseUrl |
https://api.hunyuan.cloud.tencent.com/v1 |
Tencent Hunyuan OpenAI-compatible endpoint |
cllms.maxTokens |
0 |
Max output tokens (0 = no limit). Useful for cost control |
cllms.maxRetries |
2 |
Automatic retries for transient failures (HTTP 429, 5xx, network blips) before any output streams. Honors Retry-After and uses exponential backoff with jitter; retries stop once output starts, so a response is never duplicated. 0 disables |
cllms.modelIdOverrides |
prefilled official ID map |
API model IDs to send for each Qwen model. Change only for compatible third-party APIs with different model names |
cllms.zai.modelIdOverrides |
prefilled official ID map |
API model IDs to send for each z.ai (GLM) model |
cllms.minimax.modelIdOverrides |
prefilled official ID map |
API model IDs to send for each MiniMax model |
cllms.xiaomi.modelIdOverrides |
prefilled official ID map |
API model IDs to send for each Xiaomi MiMo model |
cllms.moonshot.modelIdOverrides |
prefilled official ID map |
API model IDs to send for each Moonshot (Kimi) model |
cllms.hunyuan.modelIdOverrides |
prefilled official ID map |
API model IDs to send for each Tencent Hunyuan model |
cllms.debugMode |
minimal |
Diagnostic mode: minimal for token usage only, metadata for privacy-preserving logs, or verbose for full request dumps and pipeline snapshots under extension global storage. Full dumps may include sensitive prompt text, tool schemas, file snippets, and image descriptions. Use CLLMs: Open Request Dumps Folder to open the dump location |
cllms.visionModel |
(auto) |
Which Copilot model to proxy images through when the selected model is text-only |
cllms.visionPrompt |
(built-in) |
Prompt used to describe image attachments via the vision proxy |
cllms.experimental.stabilizeToolList |
false |
Experimental. Tries to pre-activate VS Code/Copilot virtual tools so the tools parameter is more complete and stable across turns. May improve context-cache hit rate when enabled tools change between turns. Can increase input tokens because more function definitions may be included; cache-hit input tokens are cheaper but still count toward usage. Usually leave it off with 64 or fewer enabled tools unless the tool list still changes across turns; do not enable it with more than 128 enabled tools |
Thinking Effort is configured from Copilot Chat's model picker for each thinking-capable model.
Commands
Run these from the Command Palette (Cmd/Ctrl+Shift+P):
| Command |
Description |
CLLMs: Set API Key |
Store a provider's API key in the OS keychain |
CLLMs: Get API Key |
Open a provider's API key page |
CLLMs: Clear API Key |
Remove a provider's stored key |
CLLMs: Configure Vision Proxy |
Pick the model used to describe images for text-only models |
CLLMs: Test Provider Connection |
Verify a provider's key + endpoint via /v1/models and flag stale modelIdOverrides |
CLLMs: Show Session Cost |
Show approximate spend per model for this session, with a reset action |
CLLMs: Open Settings |
Jump to the extension settings |
CLLMs: Show Logs |
Open the diagnostic output channel |
CLLMs: Open Request Dumps Folder |
Open the verbose request-dump folder (debug mode) |
Example settings.json override for compatible API proxies:
{
"cllms.modelIdOverrides": {
"qwen3-coder-plus": "your-coder-model-id",
"qwen-plus": "your-plus-model-id",
"qwen3-max": "your-max-model-id",
"qwen3-vl-plus": "your-vl-model-id"
}
}
Using z.ai (Zhipu GLM)
z.ai is a first-class provider — no proxy or model-ID hacking required:
- Run
CLLMs: Set API Key and pick z.ai (Zhipu GLM). Get a key from the z.ai API keys page.
- Open the Copilot Chat model picker — the GLM models appear alongside the Qwen ones.
GLM thinking is sent in z.ai's native format (thinking: { type: "enabled" }), tool calling works, and GLM-4.5V is used as a native vision model (images sent directly). If you have a GLM Coding Plan subscription, set cllms.zai.baseUrl to https://api.z.ai/api/coding/paas/v4.
Using MiniMax
MiniMax is also a first-class provider:
- Run
CLLMs: Set API Key and pick MiniMax. Get a key from the MiniMax platform.
- Open the Copilot Chat model picker — MiniMax-M3 and MiniMax-M2.7 appear alongside the others.
MiniMax thinking is sent in its native format (thinking: { type: "adaptive" }) and reasoning is requested via reasoning_split: true so it streams cleanly through reasoning_content. Tool calling works; MiniMax-M3 is a native vision model (images sent directly), while MiniMax-M2.7 is text-only and image attachments use the vision proxy fallback. The default endpoint is the international https://api.minimax.io/v1 — set cllms.minimax.baseUrl to https://api.minimaxi.com/v1 for mainland China.
Using Xiaomi MiMo
Xiaomi MiMo is a first-class provider too:
- Run
CLLMs: Set API Key and pick Xiaomi MiMo. Create a pay-as-you-go (sk-...) key on the Xiaomi MiMo open platform console.
- Open the Copilot Chat model picker — MiMo V2.5 Pro, MiMo V2.5 (Omni), and MiMo V2 Flash appear alongside the others.
MiMo is a hybrid-reasoning family: thinking is on by default and sent in the same format as GLM (thinking: { type: "enabled" | "disabled" }; MiMo doesn't support a thinking budget), with reasoning streamed through reasoning_content. Tool calling works, and the omni model MiMo V2.5 (Omni) accepts native image input while the Pro/Flash models fall back to the vision proxy. The default endpoint is the official open platform https://api.xiaomimimo.com/v1.
Note: a MiMo Token Plan subscription (tp-... key) uses a different, subscription-specific base URL and is restricted to coding tools — point cllms.xiaomi.baseUrl at the URL shown on your subscription page if you use one. Pay-as-you-go (sk-...) keys work with the default endpoint.
Using Moonshot (Kimi)
Moonshot Kimi is a first-class provider too:
- Run
CLLMs: Set API Key and pick Moonshot (Kimi). Create a key in the Moonshot console.
- Open the Copilot Chat model picker — Kimi K2.6 and Kimi K2.5 appear alongside the others.
Kimi K2.6 / K2.5 are native-multimodal hybrid-reasoning models (256K context): thinking is on by default and sent in the GLM-style thinking: { type: "enabled" | "disabled" }, with reasoning streamed through reasoning_content. Tool calling works, and both models accept native image input. The default endpoint is the international https://api.moonshot.ai/v1 — set cllms.moonshot.baseUrl to https://api.moonshot.cn/v1 for mainland China.
Note: Moonshot keys are region-specific — an international (platform.moonshot.ai) key only works against api.moonshot.ai, and a mainland-China (platform.moonshot.cn) key only works against api.moonshot.cn. The legacy kimi-k2-* series (incl. kimi-k2-thinking) was retired on 2026-05-25; use K2.6 / K2.5.
Using Tencent Hunyuan (混元)
Tencent Hunyuan is a first-class provider too:
- Run
CLLMs: Set API Key and pick Tencent Hunyuan (混元). Get an API key from the Tencent Cloud Hunyuan console.
- Open the Copilot Chat model picker — Tencent HY 2.0 Think, Hunyuan TurboS, Hunyuan T1, and Hunyuan A13B appear alongside the others.
Hunyuan uses the standard OpenAI-compatible Chat Completions API. HY 2.0 Think and T1 are deep-thinking models with thinking sent in GLM-style thinking: { type: "enabled" | "disabled" }; TurboS and A13B are fast instruct models. Tool calling works across all four models; there are no native vision models yet, so image attachments use the vision proxy fallback. The default endpoint is https://api.hunyuan.cloud.tencent.com/v1.
Other OpenAI-compatible providers
Beyond the six built-in providers, requests go through a standard OpenAI-compatible Chat Completions endpoint, so you can repoint any provider at a compatible service via its baseUrl and map IDs with modelIdOverrides. For example, to serve GLM through the Qwen slots instead:
{
"cllms.baseUrl": "https://api.z.ai/api/paas/v4",
"cllms.modelIdOverrides": {
"qwen3-coder-plus": "glm-4.6"
}
}
First-class, named entries for more Chinese providers are on the roadmap.
License
MIT — see NOTICE for attribution to the upstream project.