Ollama Autopilot - Local LLM Autocomplete for VS Code

Features • Requirements • Extension Settings • How It Works • Performance • Changelog

Offline AI code completion for VS Code powered by Ollama.

Ollama Autopilot provides fast inline code autocomplete using local large language models (LLMs). No API keys. No cloud. No data leaves your machine.

Perfect for developers who want:

A GitHub Copilot alternative
Fully local AI coding
Privacy-focused autocomplete
Open-source AI tooling

✨ Features

🦙 Fully Local LLM Autocomplete

Uses Ollama to generate inline code completions directly from local models.

⚡ Inline Completion

Suggestions appear directly in the editor as you type — no chat window required.

🚦 Automatic or Manual Trigger

Decide whether you want automatic code suggestions or to trigger them manually via keybinding.
The default keybinding is ctrl+alt+space, but it can be overwritten by the user.

🧠 Customizable Prompt Templates

You have full control over the completion behavior via a configurable prompt template. Supported template variables:

${workspaceName}
${fileName}
${languageId}
${textBeforeCursor}
${textAfterCursor}

The default prompt is optimized for short, style-matching inline completions.

💉 Fill-in-the-Middle (FIM) Support

To use FIM, you could, for example, use the model codellama:7b-code and the following prompt:

<PRE> ${textBeforeCursor} <SUF>${textAfterCursor} <MID>

⚠️ Ensure to use the correct FIM syntax for your model of choice. Also specify a custom end sequence string if needed. E.g., <EOT> is needed for codellama:7b-code.

🔁 Model Selection and Configuration

Browse and switch between locally installed Ollama models directly from VS Code.
Configure model parameters such as:

Temperature
Context size
Response token count

😴 Snooze Mode

Temporarily disable autocomplete for a configurable number of minutes.

📊 Status Bar Indicator

Clear status feedback of:

Enabled
Disabled
Snoozed
Ollama not available
Missing model

Access the menu directly from the status bar.

📦 Requirements

Before using this extension:

Install Ollama
Ensure Ollama is running
Pull at least one model, e.g.:
```
ollama pull deepseek-coder-v2:16b
```

⚠️ Make sure your model's context size supports your configured prompt size and surrounding text.

⚙️ Extension Settings

General

Setting	Description	Default
`ollama-autopilot.general.autopilotEnabled`	Enable/disable Autopilot	`true`
`ollama-autopilot.general.suggestionTrigger`	Trigger selection for code suggestion	`automatic`
`ollama-autopilot.general.baseUrl`	Ollama API base URL	`http://localhost:11434`
`ollama-autopilot.general.autocompleteDelayMs`	Delay before requesting completion	`500`
`ollama-autopilot.general.snoozeTimeMin`	Snooze duration in minutes	`5`

Model

Setting	Description	Default
`ollama-autopilot.model.modelName`	Ollama model name	`"deepseek-coder-v2:16b"`
`ollama-autopilot.model.contextSize`	Model context size	`4096`
`ollama-autopilot.model.maxAutocompleteTokens`	Maximum completion tokens	`100`
`ollama-autopilot.model.temperature`	Sampling temperature	`0.1`
`ollama-autopilot.model.modelKeepAliveTimeMin`	Model keep-alive time in memory (`-1` = unlimited)	`10`
`ollama-autopilot.model.stopSequences`	Stop sequences strings to halt response	[`"\n\n"`, "```", `"<EOT>"`, `"<｜EOT｜>"`]

Prompt

Setting	Description	Default
`ollama-autopilot.prompt.textBeforeCursorSize`	Characters (not tokens) before cursor to include	`2048`
`ollama-autopilot.prompt.textAfterCursorSize`	Characters (not tokens) after cursor to include	`0`
`ollama-autopilot.prompt.promptText`	Prompt template	See default

🎛 Commands

Available via Command Palette:

Ollama Autopilot: Show Menu
Ollama Autopilot: Enable
Ollama Autopilot: Disable
Ollama Autopilot: Snooze
Ollama Autopilot: Select Model

🧩 How It Works

Captures configurable surrounding context
Builds a prompt using your template
Sends the request to Ollama
Returns only the code continuation
Displays inline completion

All processing happens locally!

🔒 Privacy

No external APIs
No telemetry
No cloud services
All completions are generated locally

🚀 Performance Notes

Ollama Autopilot runs entirely locally. Performance depends heavily on:

Model and it's size
Hardware (CPU / GPU)
Available RAM
Context size configuration
Prompt

Larger models (e.g., 16B+) may introduce noticeable latency before inline suggestions appear, especially on CPU-only systems.

Tips for Better Performance

Try Fill-in-the-Middle (FIM) if your model supports it
Use smaller models (e.g., 7B variants)
Reduce textBeforeCursorSize
Reduce textAfterCursorSize to 0 and don't use in prompt
Lower maxAutocompleteTokens
Ensure Ollama is running with GPU acceleration if available
The default prompt was optimized for the deepseek-coder-v2:16b (instruct) model. You may want to tweak the prompt for your specific model.

📌 Changelog

⬆️ Click the title to view the changelog. ⬆️

🙏 Acknowledgments

Built with Ollama
Heavily inspired by:
- GitHub Copilot
- ChatGPT Copilot
- Ollama Copilot
- Ollama Autocoder
- Local LLM for VS Code

👨 Author

Daniel Duller - dadul96

License

This project is licensed under the MIT License - see the LICENSE file for details