oMLX Copilot Chat
Use models served by oMLX from the Visual Studio Code Copilot Chat model picker.
Repository: https://github.com/mikedoise/oMLX-Copilot-Chat
This extension registers oMLX as a VS Code Language Model Chat Provider and
talks to the oMLX OpenAI-compatible API:
GET /v1/models
POST /v1/chat/completions
The default endpoint is http://127.0.0.1:8000/v1.
Requirements
- VS Code 1.104 or newer
- GitHub Copilot Chat
- A running oMLX server
- An oMLX API token generated in the oMLX admin panel
Setup
- Start oMLX.
- Generate an API token in the oMLX admin panel.
- Run
oMLX: Set API Token from the Command Palette.
- Wait for the token validation message. If validation fails, paste the exact token from oMLX again.
- Run
oMLX: Test Connection if you want to retest later.
- Open Copilot Chat, manage language models, and enable the oMLX models you want to use.
The token is stored in VS Code Secret Storage, not in settings.json.
oMLX: Set API Token only changes the token VS Code sends to oMLX; it does not
create or rotate a token in oMLX. The value must exactly match a token generated
in the oMLX admin panel.
Copilot Agent Context
Copilot Agent mode has a large built-in prompt. If an oMLX model is advertised
with a 32k context window, Copilot may compact the conversation on nearly every
turn even for short prompts.
For normal Agent mode use, configure oMLX to allow a larger context window and
set the advertised input-token override in VS Code:
"omlx.maxInputTokensOverride": 65536
Use 131072 if the selected oMLX model and runtime are configured for a 128k
context. Keep the override at 0 when you want the extension to use the context
reported by /v1/models.
The override only changes what the extension advertises to Copilot. The oMLX
server must also be configured to accept that context size, or oMLX will reject
long requests with a prompt-too-long error.
Settings
omlx.baseUrl: oMLX OpenAI-compatible base URL.
omlx.maxInputTokensOverride: optional input-token override. Leave as 0 to auto-detect from model metadata. For Copilot Agent mode, use 65536 or higher if oMLX is configured for that context size.
omlx.maxOutputTokensOverride: optional output-token override. Leave as 0 to use model metadata when available.
omlx.requestTimeoutMs: request timeout for model discovery and chat requests.
omlx.enableImageInput: advertise OpenAI-compatible image input support.
omlx.enableToolCalling: advertise and forward OpenAI-compatible tool calls. Defaults to true for Agent mode compatibility.
omlx.maxToolCount: maximum number of tools advertised for Agent mode requests. Defaults to 16.
Image input defaults to off. Tool calling defaults to on because VS Code Agent
mode filters for tool-capable models. Disable tool calling if your selected
oMLX model/server rejects OpenAI-compatible tool schemas.
If oMLX rejects a request with a concrete runtime context-window error, the
extension remembers that limit for the model and refreshes the advertised model
metadata for future requests.
Development
npm install
npm run compile
Press F5 in VS Code to launch an Extension Development Host.