Skip to content
| Marketplace
Sign in
Visual Studio Code>AI>Internal OllamaNew to Visual Studio Code? Get it now.
Internal Ollama

Internal Ollama

010101010101

|
2 installs
| (0) | Free
Bundled private Ollama for VS Code — Copilot Chat (BYOK), local chat, and inline completions without a separate Ollama install.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Internal Ollama

Run a private, bundled Ollama engine inside VS Code — no separate Ollama desktop install required. Use local models in GitHub Copilot Chat (BYOK), the built-in Ollama Chat panel, and inline tab completions.

All inference stays on your machine. The extension sets OLLAMA_NO_CLOUD=true and does not phone home to ollama.com.

Features

  • Auto-bootstrap Ollama — downloads the official portable runtime on first run (~1.4 GB, one time); dev/F5 uses local bin/ollama.exe when present
  • Copilot Chat (BYOK) — auto-configures github.copilot.chat.byok.ollamaEndpoint to the bundled server
  • Large-model support — OLLAMA_CONTEXT_LENGTH=262144 (same default as the Ollama desktop app); Ollama fits layers/KV cache to your GPU
  • Ollama Chat panel — local chat with workspace context (.github/copilot-instructions.md, agents, prompts)
  • Inline completions — optional tab-complete powered by your default model
  • Model management — pull models, optional aliases, unload RAM, custom model storage path

Requirements

Requirement Notes
VS Code ^1.120.0 (Copilot Chat + BYOK)
GitHub Copilot For Copilot Chat integration
Windows Primary target (amd64 or arm64 portable zip)
GPU NVIDIA recommended for large models (e.g. Gemma 4 26B)
Disk ~1.4 GB for Ollama runtime (cached in extension global storage) + models under %USERPROFILE%\OllamaModels (13+ GB per large model)
Network First launch only — downloads Ollama from GitHub releases

Quick start

  1. Install the extension from the Marketplace and reload VS Code.
  2. On first run, wait for the Internal Ollama notification to finish downloading and extracting the runtime (~1.4 GB).
  3. Open the Internal Ollama output channel — confirm OLLAMA_CONTEXT_LENGTH=262144 and Copilot BYOK endpoint.
  4. Run Ollama: Install Local Model (e.g. gemma4:26b-a4b-it-qat).
  5. In Copilot Chat → Manage Language Models → enable your model under the Ollama provider (not “Internal Ollama”).
  6. Start a new Copilot chat and select gemma4:26b-a4b-it-qat.

First load of a large model can take a minute while Ollama fits weights to VRAM.

Copilot integration

Default mode is internalOllama.copilotIntegration: "byok" — Copilot talks to the bundled server via /v1/chat/completions, the same path as a system Ollama install.

Provider in Copilot Use when
Ollama Default — pick gemma4:26b-a4b-it-qat etc.
Internal Ollama Only if you set copilotIntegration to "provider" (legacy)

Optional copilot-* aliases are duplicate tags (FROM base-model). They appear under Ollama after Ollama: Create Optional Model Alias and a model-list refresh. You do not need an alias for Copilot to work.

Commands

Command Description
Ollama: Check Engine Status Port, model count, RAM usage
Ollama: Open Chat Local chat panel with workspace context
Ollama: Install Local Model Pull from Ollama registry
Ollama: Create Optional Model Alias Duplicate tag with a shorter name
Ollama: Set Model Storage Directory Path Change OLLAMA_MODELS location
Ollama: Stop All Running Models (Free RAM) Unload models from memory
Ollama: Refresh Copilot Model List Re-apply BYOK endpoint / refresh provider
Ollama: Reinstall Runtime Re-download Ollama if bootstrap failed or runtime is corrupt

Settings

Setting Default Description
internalOllama.port 11434 Bundled Ollama HTTP port
internalOllama.copilotIntegration byok byok (Copilot native) or provider (legacy LM provider)
internalOllama.contextLength 0 Override OLLAMA_CONTEXT_LENGTH; 0 = 262144
internalOllama.defaultModel "" Default for inline completion and Open Chat
internalOllama.enableInlineCompletion true Tab completions via Ollama
internalOllama.ollamaVersion 0.30.7 Ollama version to download on first run

Workspace context (Open Chat)

The chat panel reads the same .github/ layout Copilot uses:

  • .github/copilot-instructions.md
  • .github/instructions/*.instructions.md
  • .github/prompts/*.prompt.md (slash commands)
  • .github/agents/*.agent.md

Compact README, package.json, file tree, and the active editor file are appended to user messages so context survives small context windows.

Troubleshooting

“Model context is full (4096 tokens)”
Reload the window. Confirm the output channel shows OLLAMA_CONTEXT_LENGTH=262144. Use the Ollama provider in Copilot, not Internal Ollama. Start a new chat.

Copilot shows no models
Run Ollama: Refresh Copilot Model List. Check Settings → GitHub Copilot Chat → BYOK → Ollama Endpoint is http://127.0.0.1:11434.

Port 11434 in use
Quit any other Ollama instance or change internalOllama.port and refresh BYOK.

Alias “created” but not listed
Reload after the fix in 0.1.0. Aliases are optional; use the base model name under Ollama.

Build from source

npm install
npm run compile

Press F5 in VS Code to launch the Extension Development Host.

Package a .vsix (≈1.4 GB — includes bundled Ollama runtime):

npm run vsix

Publish to the Marketplace (requires a publisher account):

npx @vscode/vsce@2.32.0 publish -p <YOUR_PAT>

Note: Latest vsce may fail secret-scanning on multi-GB bin/ files (ERR_STRING_TOO_LONG). The vsix script pins vsce@2.32.0 until that is fixed upstream.

Update publisher, repository, and bugs in package.json if your Marketplace ID or GitHub URL differs.

Third-party software

This extension bundles the Ollama runtime (bin/ollama.exe). Ollama is licensed under the MIT License. See NOTICES.md.

Model weights pulled via Ollama are subject to each model’s license (e.g. Gemma terms from Google).

License

Extension source code: MIT — Copyright (c) 2026 Internal Ollama contributors.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft