Offline AI code completion for VS Code powered by Ollama.
Ollama Autopilot provides fast inline code autocomplete using local large language models (LLMs).
No API keys. No cloud. No data leaves your machine.
Perfect for developers who want:
- A GitHub Copilot alternative
- Fully local AI coding
- Privacy-focused autocomplete
- Open-source AI tooling

✨ Features
🦙 Fully Local LLM Autocomplete
Uses Ollama to generate inline code completions directly from local models.
⚡ Inline Completion
Suggestions appear directly in the editor as you type — no chat window required.
🚦 Automatic or Manual Trigger
Decide if you want automatic code suggestion or if you want to trigger it manually via keybinding.
Default keybinding is "ctrl+alt+space", but can be overwritten by the user.
🧠 Customizable Prompt Templates
You have full control over the completion behavior via a configurable prompt template. Supported template variables:
${workspaceName}
${fileName}
${languageId}
${textBeforeCursor}
${textAfterCursor}
The default prompt is optimized for short, style-matching inline completions.
🔁 Model Selection and Configuration
Browse and switch between locally installed Ollama models directly from VS Code.
Configure model parameters such as:
- Temperature
- Context size
- Response token count
😴 Snooze Mode
Temporarily disable autocomplete for a configurable number of minutes.
📊 Status Bar Indicator
Clear status feedback of:
- Enabled
- Disabled
- Snoozed
- Ollama not available
- Missing model
Access the menu directly from the status bar.
📦 Requirements
Before using this extension:
- Install Ollama
- Ensure Ollama is running
- Pull at least one model like for example:
ollama pull deepseek-coder-v2:16b
⚠️ Make sure your model's context size supports your configured prompt size and surrounding text.
⚙️ Extension Settings
General
| Setting |
Description |
Default |
ollama-autopilot.general.autopilotEnabled |
Enable/disable Autopilot |
true |
ollama-autopilot.general.suggestionTrigger |
Trigger selection for code suggestion |
automatic |
ollama-autopilot.general.baseUrl |
Ollama API base URL |
http://localhost:11434 |
ollama-autopilot.general.autocompleteDelayMs |
Delay before requesting completion |
500 |
ollama-autopilot.general.snoozeTimeMin |
Snooze duration in minutes |
5 |
Model
| Setting |
Description |
Default |
ollama-autopilot.model.modelName |
Ollama model name |
"deepseek-coder-v2:16b" |
ollama-autopilot.model.contextSize |
Model context size |
4096 |
ollama-autopilot.model.maxAutocompleteTokens |
Maximum completion tokens |
100 |
ollama-autopilot.model.temperature |
Sampling temperature |
0.1 |
ollama-autopilot.model.modelKeepAliveTimeMin |
Model keep-alive time in memory (-1 = unlimited) |
10 |
Prompt
| Setting |
Description |
Default |
ollama-autopilot.prompt.textBeforeCursorSize |
Characters (not tokens) before cursor to include |
2048 |
ollama-autopilot.prompt.textAfterCursorSize |
Characters (not tokens) after cursor to include |
0 |
ollama-autopilot.prompt.promptText |
Prompt template |
See default |
🎛 Commands
Available via Command Palette:
Ollama Autopilot: Show Menu
Ollama Autopilot: Enable
Ollama Autopilot: Disable
Ollama Autopilot: Snooze
Ollama Autopilot: Select Model
🧩 How It Works
- Captures configurable surrounding context
- Builds a prompt using your template
- Sends the request to Ollama
- Returns only the code continuation
- Displays inline completion
All processing happens locally!
🔒 Privacy
- No external APIs
- No telemetry
- No cloud services
- All completions are generated locally
Ollama Autopilot runs entirely locally. Performance depends heavily on:
- Model size
- Hardware (CPU / GPU)
- Available RAM
- Context size configuration
Larger models (e.g., 16B+) may introduce noticeable latency before inline suggestions appear, especially on CPU-only systems.
- Use smaller models (e.g., 7B variants)
- Reduce
textBeforeCursorSize
- Reduce
textAfterCursorSize to 0 and don't use in prompt
- Lower
maxAutocompleteTokens
- Ensure Ollama is running with GPU acceleration if available
⬆️ Click the title to view the changelog. ⬆️
🙏 Acknowledgments
- Built with Ollama
- Heavily inspired by:
👨 Author
Daniel Duller - dadul96
License
This project is licensed under the MIT License - see the LICENSE file for details