Z.AI for GitHub Copilot Chat

Use Z.AI GLM models directly in GitHub Copilot Chat — no Copilot Pro/Enterprise subscription needed. Just bring your own API key (BYOK).

What Is This?

Z.AI for GitHub Copilot Chat is a VS Code extension that registers Z.AI GLM series models — including GLM-4.7, GLM-5, GLM-5.1, and GLM-4.5 — into GitHub Copilot Chat via the official VS Code Language Model Chat Provider API.

This lets you pick and use Z.AI GLM models directly from the Copilot Chat model picker, just like selecting GPT-4 or Claude — no extra Copilot Pro/Enterprise subscription required. Simply enter your Z.AI API key.

Model	Description	Vision
GLM-4.7	Latest flagship GLM model with 200K context	❌
GLM-5	Next-generation GLM model	❌
GLM-5.1	Enhanced GLM-5 with improved reasoning	❌
GLM-4.5-Air	Lightweight GLM for faster responses	❌
GLM-4.5-Flash	Fastest GLM model for high-throughput use	❌
GLM-5V-Turbo	Multimodal vision + coding base model	✅
GLM-4.6V	Visual reasoning model	✅
GLM-4.6V-Flash	Free vision model with tool calling	✅

✨ Features

BYOK — configure your Z.AI API key once, all models are available
Live model list — fetches available models from Z.AI API on every startup
Bundled fallback — works offline or if the API is unreachable, using a curated model table with accurate token limits
Per-model token limits — precise context window and max output token values per model, not a single global cap
Tool-calling support — forwards tool schemas using OpenAI-compatible chat completions
Reasoning debug — opt-in reasoning_content logging to the Z.AI output channel
Diagnostics command — one-click markdown report showing exactly which models VS Code has registered

Requirements

VS Code 1.120.0 or higher with the Language Model Chat Provider API
GitHub Copilot Chat extension — install from marketplace (required — this extension only adds models into Copilot Chat)
Sign in to GitHub Copilot Chat (a personal GitHub account is sufficient — no Copilot Pro/Enterprise needed for BYOK)
A Z.AI API key — get one at z.ai

⚡ Quick Start

Install GitHub Copilot Chat from the marketplace if you haven't already.
Install this extension (or press F5 in the repo to launch an Extension Development Host).
Open GitHub Copilot Chat (click the Copilot icon in the sidebar or press Cmd+Shift+I / Ctrl+Shift+I).
Click the model picker (current model name) → Manage Models…
Select Z.AI.
Press Enter to accept the default Group Name.
Enter your Z.AI API Key when prompted — VS Code stores it securely as a secret.
Choose the models you want available.
Select any Z.AI model from the picker and start chatting.

💡 Tips:

Registered models are automatically available in the Copilot Chat model picker — no extra setup needed.

If a model appears in the Language Models view but not in the chat picker, hover its row and click the eye icon (👁) to enable visibility.

Commands

Once installed, Z.AI models appear directly in the GitHub Copilot Chat model picker — no special commands needed. The easiest way to manage your API key is via Settings → Language Models (gear icon ⚙).

For advanced usage, you can also run these commands via the Command Palette (Cmd+Shift+P / Ctrl+Shift+P):

Command	Description
`Z.AI: Manage Provider`	Manage API key, refresh models, or test connection
`Z.AI: Set API Key`	Store or update your Z.AI API key
`Z.AI: Diagnostics`	Show a markdown report of all registered Z.AI models

Note: The native BYOK flow via Language Models (gear icon ⚙) is recommended.

Settings

Setting	Type	Default	Description
`zai.temperature`	`number`	`0.2`	Sampling temperature for chat completions (`0`–`2`)
`zai.maxTokens`	`number`	`0`	Max output token override — `0` uses the per-model bundled maximum
`zai.maxInputTokens`	`number`	`0`	Context window override — `0` uses the per-model bundled context size
`zai.debugReasoning`	`boolean`	`false`	Write provider `reasoning_content` to Output → Z.AI for debugging

Models

The extension fetches the live model list from:

https://api.z.ai/api/coding/paas/v4/models

Because the Z.AI API returns model IDs only, a bundled metadata table provides context window and max output tokens per model. If the live fetch fails, the bundled list is used as a fallback.

VS Code and Copilot read separate input/output metadata fields for UI display. GLM models can have very large output limits, so the extension advertises a small response reserve to keep the Language Models table, model picker tooltip, and chat context indicator consistent while still sending each model's full bundled max output limit to the Z.AI API.

Bundled model limits

Model	Context window	Max output tokens	Vision
`glm-4.7`	200K (204,800)	128K (131,072)	❌
`glm-5`	200K (204,800)	128K (131,072)	❌
`glm-5.1`	200K (204,800)	128K (131,072)	❌
`glm-4.5-air`	128K (131,072)	96K (98,304)	❌
`glm-4.5-flash`	128K (131,072)	96K (98,304)	❌
`glm-5v-turbo`	200K (204,800)	128K (131,072)	✅
`glm-4.6v`	128K (131,072)	32K (32,768)	✅
`glm-4.6v-flash`	128K (131,072)	32K (32,768)	✅

Set zai.maxInputTokens or zai.maxTokens to a non-zero value to override the bundled defaults globally.

All models use the OpenAI-compatible chat completions endpoint:

https://api.z.ai/api/coding/paas/v4/chat/completions

Development

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Watch mode
npm run watch

Press F5 in VS Code to launch an Extension Development Host with the extension loaded.

To package a .vsix for local install:

npm run package

Contributing

Issues and pull requests are welcome. Please open an issue first for significant changes so we can discuss the approach.

License

MIT — see LICENSE for details.

Z.AI Copilot Chat

Laksmana Tri Moerdani