A Bring-Your-Own-Model provider that registers your local
LM Studio server with VS Code's language model API
(vscode.lm). Once installed, every loaded LM Studio model shows up in the
GitHub Copilot Chat model picker — and is available to any extension that
calls vscode.lm.selectChatModels({ vendor: 'lmstudio' }).
LM Studio running with the Local Server enabled (default http://localhost:1234)
At least one chat-capable model loaded in LM Studio
Features
Auto-discovery of loaded models via LM Studio's /v1/models endpoint
Streaming chat completions over Server-Sent Events
Tool / function calling for models that support it (OpenAI-style delta
tool calls, plus inline <tool_call> XML format used by Qwen / Llama)
Vision input for multimodal models that advertise the vision capability
Strips <think>...</think> reasoning traces from output
Optional API key for remote / authenticated LM Studio servers
Settings
Setting
Default
Description
lmstudio.serverUrl
http://localhost:1234
Base URL of the LM Studio server
lmstudio.apiKey
""
Optional bearer token
lmstudio.requestTimeout
120000
Timeout (ms) for non-streaming calls
lmstudio.maxInputTokens
32768
Fallback context window
lmstudio.maxOutputTokens
8192
Maximum tokens to generate
lmstudio.enableToolCalling
true
Advertise tool calling to VS Code
Commands
LM Studio: Refresh Available Models — re-fetches the model list
LM Studio: Check Server Connection — pings /v1/models and reports status
Development
npm install
npm run compile # one-shot bundle to dist/extension.js
npm run watch # incremental rebuild
npm run package:vsix # produce a .vsix to install locally
Press F5 in VS Code to launch an Extension Development Host with the
provider registered.
How it works
The extension registers a LanguageModelChatProvider under the vendor
identifier lmstudio (declared in package.json under
contributes.languageModelChatProviders):
provideLanguageModelChatInformation — returns the list of models that
VS Code should show in the picker
provideLanguageModelChatResponse — converts VS Code messages to OpenAI
format, opens a streaming POST to /v1/chat/completions, and forwards
text + tool calls back to VS Code as LanguageModelTextPart /
LanguageModelToolCallPart
provideTokenCount — character-based heuristic (LM Studio does not expose
a tokenizer endpoint)