Skip to content
| Marketplace
Sign in
Visual Studio Code>AI>Ollama Autopilot - Local LLM AutocompleteNew to Visual Studio Code? Get it now.
Ollama Autopilot - Local LLM Autocomplete

Ollama Autopilot - Local LLM Autocomplete

dadul96

|
21 installs
| (0) | Free
Local LLM code autocomplete for VS Code powered by Ollama. Inline AI code completion - fully offline, no API keys, no cloud.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Ollama Autopilot - Local LLM Autocomplete for VS Code

Features • Requirements • Extension Settings • How It Works • Performance • Changelog


Offline AI code completion for VS Code powered by Ollama.

Ollama Autopilot provides fast inline code autocomplete using local large language models (LLMs). No API keys. No cloud. No data leaves your machine.

Perfect for developers who want:

  • A GitHub Copilot alternative
  • Fully local AI coding
  • Privacy-focused autocomplete
  • Open-source AI tooling

✨ Features

🦙 Fully Local LLM Autocomplete

Uses Ollama to generate inline code completions directly from local models.

⚡ Inline Completion

Suggestions appear directly in the editor as you type — no chat window required.

🚦 Automatic or Manual Trigger

Decide if you want automatic code suggestion or if you want to trigger it manually via keybinding. Default keybinding is "ctrl+alt+space", but can be overwritten by the user.

🧠 Customizable Prompt Templates

You have full control over the completion behavior via a configurable prompt template. Supported template variables:

  • ${workspaceName}
  • ${fileName}
  • ${languageId}
  • ${textBeforeCursor}
  • ${textAfterCursor}

The default prompt is optimized for short, style-matching inline completions.

🔁 Model Selection and Configuration

Browse and switch between locally installed Ollama models directly from VS Code.
Configure model parameters such as:

  • Temperature
  • Context size
  • Response token count

😴 Snooze Mode

Temporarily disable autocomplete for a configurable number of minutes.

📊 Status Bar Indicator

Clear status feedback of:

  • Enabled
  • Disabled
  • Snoozed
  • Ollama not available
  • Missing model

Access the menu directly from the status bar.

📦 Requirements

Before using this extension:

  1. Install Ollama
  2. Ensure Ollama is running
  3. Pull at least one model like for example:
ollama pull deepseek-coder-v2:16b

⚠️ Make sure your model's context size supports your configured prompt size and surrounding text.

⚙️ Extension Settings

General

Setting Description Default
ollama-autopilot.general.autopilotEnabled Enable/disable Autopilot true
ollama-autopilot.general.suggestionTrigger Trigger selection for code suggestion automatic
ollama-autopilot.general.baseUrl Ollama API base URL http://localhost:11434
ollama-autopilot.general.autocompleteDelayMs Delay before requesting completion 500
ollama-autopilot.general.snoozeTimeMin Snooze duration in minutes 5

Model

Setting Description Default
ollama-autopilot.model.modelName Ollama model name "deepseek-coder-v2:16b"
ollama-autopilot.model.contextSize Model context size 4096
ollama-autopilot.model.maxAutocompleteTokens Maximum completion tokens 100
ollama-autopilot.model.temperature Sampling temperature 0.1
ollama-autopilot.model.modelKeepAliveTimeMin Model keep-alive time in memory (-1 = unlimited) 10

Prompt

Setting Description Default
ollama-autopilot.prompt.textBeforeCursorSize Characters (not tokens) before cursor to include 2048
ollama-autopilot.prompt.textAfterCursorSize Characters (not tokens) after cursor to include 0
ollama-autopilot.prompt.promptText Prompt template See default

🎛 Commands

Available via Command Palette:

  • Ollama Autopilot: Show Menu
  • Ollama Autopilot: Enable
  • Ollama Autopilot: Disable
  • Ollama Autopilot: Snooze
  • Ollama Autopilot: Select Model

🧩 How It Works

  1. Captures configurable surrounding context
  2. Builds a prompt using your template
  3. Sends the request to Ollama
  4. Returns only the code continuation
  5. Displays inline completion

All processing happens locally!

🔒 Privacy

  • No external APIs
  • No telemetry
  • No cloud services
  • All completions are generated locally

🚀 Performance Notes

Ollama Autopilot runs entirely locally. Performance depends heavily on:

  • Model size
  • Hardware (CPU / GPU)
  • Available RAM
  • Context size configuration

Larger models (e.g., 16B+) may introduce noticeable latency before inline suggestions appear, especially on CPU-only systems.

Tips for Better Performance

  • Use smaller models (e.g., 7B variants)
  • Reduce textBeforeCursorSize
  • Reduce textAfterCursorSize to 0 and don't use in prompt
  • Lower maxAutocompleteTokens
  • Ensure Ollama is running with GPU acceleration if available

📌 Changelog

⬆️ Click the title to view the changelog. ⬆️

🙏 Acknowledgments

  • Built with Ollama
  • Heavily inspired by:
    • GitHub Copilot
    • ChatGPT Copilot
    • Ollama Copilot
    • Ollama Autocoder
    • Local LLM for VS Code

👨 Author

Daniel Duller - dadul96

License

This project is licensed under the MIT License - see the LICENSE file for details

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft