Ollama Code Completions for Visual Studio Code

Also available for Visual Studio
Inline ghost-text autocomplete for VS Code, powered by a self-hosted Ollama instance.
Sends the code around your cursor to an Ollama FIM-capable model (e.g. qwen2.5-coder, codellama, deepseek-coder-v2, starcoder2) and shows the suggestion as ghost text. Press Tab to accept, Esc to dismiss.
Features
- Inline completions via VS Code's native
InlineCompletionItemProvider API
- Fill-in-the-middle prompting using Ollama's native
suffix parameter
- LRU completion cache with prefix-extension matching (instant suggestions when you backspace or keep typing)
- Post-processing: suffix overlap removal, bracket balancing, prefix-echo trimming
- Smart mid-line completions — allows completions inside JSX tag attributes and before closing punctuation; configurable via
midLineMode
- JSON/JSONC string completions — completes inside string values in
.json and .jsonc files, useful for translation files
- HTTP Basic auth support, with credentials stored in the OS keychain (Keychain / Credential Manager / libsecret)
- Diagnostic logging to an output channel and/or file, gated by settings
- "Pick Model", "Test Connection", "Show Log", "Set Credentials", "Clear Credentials" commands
Requirements
- VS Code 1.85 or later
- An Ollama server reachable from your machine (default
http://localhost:11434)
- A FIM-capable model installed on that server (e.g.
ollama pull qwen2.5-coder:1.5b)
Setup
- Install the extension.
- (Optional) Open Settings and set
ollamaCodeCompletions.serverUrl if your Ollama instance is not at localhost:11434.
- Run Ollama Code Completions: Pick Model from the command palette and choose a model.
- Run Ollama Code Completions: Test Connection to verify everything is wired up.
- Open a file in a supported language and start typing.
Settings
| Setting |
Default |
Description |
ollamaCodeCompletions.serverUrl |
http://localhost:11434 |
Base URL of the Ollama server. |
ollamaCodeCompletions.model |
qwen2.5-coder:1.5b |
Model name. |
ollamaCodeCompletions.useAuthentication |
false |
Send HTTP Basic auth on each request. |
ollamaCodeCompletions.enabled |
true |
Master toggle for inline completions. |
ollamaCodeCompletions.debounceMs |
300 |
Idle time after typing before a request is sent. |
ollamaCodeCompletions.maxPrefixChars |
4096 |
Maximum prefix length sent to the model. |
ollamaCodeCompletions.maxSuffixChars |
1024 |
Maximum suffix length sent to the model. |
ollamaCodeCompletions.maxPredict |
128 |
Maximum tokens for the model to generate. |
ollamaCodeCompletions.timeoutSeconds |
30 |
HTTP request timeout. |
ollamaCodeCompletions.logToFile |
false |
Write logs to OllamaCodeCompletions.log in the OS temp directory. |
ollamaCodeCompletions.logToOutputChannel |
false |
Write logs to the "Ollama Code Completions" output channel. |
ollamaCodeCompletions.showStatusBarItem |
true |
Show the status bar indicator. |
ollamaCodeCompletions.midLineMode |
"smart" |
"smart" allows mid-line completions inside JSX attributes and before closing punctuation. "never" restores the old behavior of skipping whenever there is any text after the cursor. |
Username and password are not in settings - they go in the OS keychain, set via the Set Credentials command.
Commands
All commands are available through the command palette under the Ollama Code Completions category:
- Set Credentials - prompts for username and password, stores them in
SecretStorage.
- Clear Credentials - removes stored credentials.
- Pick Model - lists installed models from
/api/tags and writes the choice to settings.
- Test Connection - verifies the server is reachable and the configured model is installed.
- Show Log - reveals the output channel.
Supported languages
JavaScript / TypeScript (incl. JSX/TSX), Python, C#, Go, Rust, Java, C/C++, PHP, Ruby, Swift, Kotlin, Scala, Dart, Lua, HTML, CSS/SCSS, JSON, YAML, Markdown, SQL, shell, PowerShell, Vue, Svelte.
The extension activates on these languages; you can add more by raising an issue.
Tips
React / JSX completions
With the default midLineMode: "smart", completions trigger inside JSX tag attributes — e.g. placing the cursor inside <Button onClick={|}> or <Card>{|}</Card> will request a suggestion. The extension detects these positions heuristically; if you prefer the strict legacy behaviour, set midLineMode to "never".
JSON translation files
The extension completes inside string values in .json and .jsonc files. This is especially useful for translation / i18n files: with sibling keys already filled in, a good FIM-capable model can pattern-match and suggest the right phrase for an empty value.
{
"save": "Save",
"cancel": "Cancel",
"delete": "|" ← cursor here triggers a completion
}
Completion quality for human-language text depends heavily on the model. Small coder models (e.g. qwen2.5-coder:1.5b) pattern-match well from nearby keys but are not real translators — treat their suggestions as a starting point, not a finished translation. Larger general-purpose models produce better prose but are slower.
Privacy
All requests go to the Ollama server you configure. Nothing is sent anywhere else. The extension logs lengths, never contents, of the prefix and suffix. Credentials live in the OS-native secret store.
Development
npm install
npm run watch # tsc -w in the background
# Press F5 in VS Code to launch an Extension Development Host
Run unit tests:
npm test
Package locally:
npm run package # produces ollama-code-completions-<version>.vsix
Before publishing
The scaffold ships with placeholders that need to be filled in:
package.json -> publisher: replace your-publisher-id with your VS Code Marketplace publisher ID.
package.json -> repository.url: replace with your real GitHub URL.
icon.png: the included icon is a placeholder; swap in your real one (256x256 PNG).
- A
VSCE_PAT repository secret is required for the publish workflow. Generate one at dev.azure.com/<org>/_usersSettings/tokens with Marketplace > Manage scope.
License
MIT