Skip to content
| Marketplace
Sign in
Visual Studio Code>Machine Learning>OllamaDevNew to Visual Studio Code? Get it now.
OllamaDev

OllamaDev

Prem Jampuram

|
2 installs
| (0) | Free
Local AI code assistant powered by Ollama — inline completions, agentic chat, semantic search, and model manager
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

OllamaDev

A free, fully local AI coding assistant for VS Code powered by Ollama. All inference runs on your machine — no API keys, no subscriptions, no data leaves your computer.


Requirements

  • Ollama installed and running (ollama serve)
  • At least one model pulled (see Recommended Models)
  • VS Code 1.85+

Quick Start

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama serve

# 2. Pull a model
ollama pull qwen2.5-coder:7b

# 3. Install the extension
code --install-extension ollama-dev-1.0.0.vsix

Open the OllamaDev panel from the Activity Bar (or press Ctrl+Shift+O).


Features

Agentic Chat

A full-featured AI chat sidebar that can read files, search the codebase, run commands, browse the web, and make code edits — all autonomously.

The agent runs in a loop: it calls tools until the task is complete, then delivers a final response. You can watch every tool call in real time with expandable result cards.

Modes: | Mode | What it does | |---|---| | Auto | Agent executes all tool calls without asking | | Manual | Shows a diff preview and asks for approval before writing files | | Plan | Produces a numbered plan before doing anything — you review before execution |

Effort levels (toolbar):

  • Fast — focused, low temperature, quick answers
  • Balanced — default for everyday tasks
  • Deep — higher temperature, more thorough exploration

Tools (35 built-in)

The agent can invoke any of these autonomously:

Category Tools
Files read_file, write_file, edit_file, read_multiple_files, copy_file, move_file, delete_file, create_directory, get_file_info, get_file_tree, get_active_file, reveal_file
Search search_workspace, find_symbol, search_and_replace, semantic_search
Git get_git_status, git_log, git_diff, git_commit, git_create_branch, get_pr_diff
Web web_search, web_fetch
Workspace list_directory, get_workspace_info, get_package_info, get_diagnostics, get_system_info, list_env_vars, run_command
Memory write_memory, read_memory, delete_memory
Testing run_tests

Persistent Memory (Brain)

OllamaDev learns from every conversation. After each response, a background reflection pass asks the model what's worth remembering — workspace facts, your preferences, patterns that worked or failed. These memories persist across sessions and are injected into every system prompt.

The model decides what to remember. No hardcoded rules.

When memory grows large, a compaction pass automatically merges and prunes entries. You can also read, write, and delete memories explicitly via the read_memory, write_memory, and delete_memory tools.


Semantic Search (RAG)

The extension automatically indexes your codebase using Ollama embeddings (nomic-embed-text by default). The semantic_search tool uses this index for conceptual queries — "find where authentication happens", "which files handle database connections" — instead of just text matching.

The index rebuilds automatically when you save files. To trigger a manual reindex: Ctrl+Shift+P → OllamaDev: Re-index Workspace.

Requires nomic-embed-text:

ollama pull nomic-embed-text

Inline Completions

Ghost-text completions as you type, accepted with Tab.

  • Uses Fill-in-the-Middle (FIM) when cursor is mid-file — the model sees both prefix and suffix for better accuracy
  • Debounced to avoid slowing you down
  • Manual trigger: Ctrl+Alt+Space
  • Toggle: Ctrl+Shift+P → OllamaDev: Toggle Inline Completions

Code Actions (Right-click Menu)

Select code, right-click, and choose from the OllamaDev group:

Action What it does
Explain Selected Code Plain-language explanation with context
Generate Tests Unit tests with edge cases
Fix / Improve Selection Bug fixes and best-practice improvements
Add Documentation JSDoc / docstrings / inline comments

Results stream directly into the chat panel.


Code Lens

TODO and FIXME comments get an inline ◆ Implement button. Click it to send the surrounding function context to the agent for automatic implementation.


Thinking Mode

For models that support reasoning (qwen3, DeepSeek-R1, QwQ), enable Think in the toolbar. The model's internal reasoning process appears as a collapsible block above the response — fully visible but out of the way.

Inline <think>...</think> tokens from models without native server-side thinking support are automatically intercepted and routed to the reasoning UI.


Multi-model Routing

Configure a fast small model for quick questions alongside a larger model for agent work:

"ollamaDev.chatModel":     "qwen2.5-coder:14b",
"ollamaDev.fastChatModel": "qwen2.5:1.5b"

Simple questions with no action verbs and no tools needed automatically route to the fast model.


PR Review

Press Ctrl+Shift+R (or run OllamaDev: Review PR / Branch Changes) to trigger a full structured code review of your current branch vs main/master. The agent fetches the diff and produces a report covering summary, bugs, security, performance, code quality, tests, and verdict.


Run Tests (AI-assisted)

Press Ctrl+Shift+T (or run OllamaDev: Run Tests) to run your test suite. If tests fail, the agent reads the relevant source files and fixes the failures automatically, re-running until they pass.

Supports: Jest, Vitest, Mocha, pytest, Cargo test, Go test, and Makefile targets.


Conversation Checkpoints

Save the current conversation state at any point and restore it later. Useful for branching explorations — try one approach, checkpoint, try another, restore if needed.


Context Attachment

Attach files and editor selections directly to your message using the File and Sel buttons in the chat toolbar. Drag-and-drop images for vision-capable models.


Vision

Attach screenshots or diagrams to your message. Works with any vision-capable model (LLaVA, Qwen-VL, Gemma Vision, etc.).


Model Manager

Press Ctrl+Shift+M to open the Model Manager panel. Pull new models, delete old ones, see model sizes and parameter counts, and switch the active model for chat and completions.


Recommended Models

RAM Chat model Pull command
8 GB qwen2.5-coder:7b (default) ollama pull qwen2.5-coder:7b
16 GB qwen3:14b ollama pull qwen3:14b
24 GB qwen2.5-coder:32b ollama pull qwen2.5-coder:32b
32 GB+ qwen3:32b ollama pull qwen3:32b

For inline completions, use a smaller/faster model:

"ollamaDev.completionModel": "qwen2.5-coder:1.5b"

For semantic search (required for RAG):

ollama pull nomic-embed-text

Settings Reference

Setting Default Description
ollamaDev.ollamaUrl http://localhost:11434 Ollama server URL
ollamaDev.chatModel qwen2.5-coder:7b Model for chat and agent loops
ollamaDev.completionModel qwen2.5-coder:7b Model for inline completions
ollamaDev.fastChatModel (empty) Optional fast model for simple questions
ollamaDev.embeddingModel nomic-embed-text Model for semantic search indexing
ollamaDev.ragEnabled true Enable semantic codebase indexing
ollamaDev.inlineCompletionsEnabled true Toggle ghost-text completions
ollamaDev.completionDebounceMs 180 Delay before firing a completion request
ollamaDev.maxCompletionTokens 80 Max tokens per inline completion
ollamaDev.contextLines 50 Lines of prefix context sent with completions
ollamaDev.fillInMiddle true Use FIM prompting when cursor is mid-file

Keyboard Shortcuts

Shortcut Action
Ctrl+Shift+O / Cmd+Shift+O Open chat panel
Ctrl+Shift+M / Cmd+Shift+M Open Model Manager
Ctrl+Shift+R / Cmd+Shift+R Review PR / branch changes
Ctrl+Shift+T / Cmd+Shift+T Run tests (AI-assisted)
Ctrl+Shift+H / Cmd+Shift+H Explain symbol under cursor
Ctrl+Alt+Space Manually trigger inline completion
Tab Accept inline suggestion
Escape Dismiss inline suggestion

Troubleshooting

"Cannot reach Ollama" Make sure Ollama is running: ollama serve. Check ollamaDev.ollamaUrl if you're on a non-default host or port.

"Model not found" Run ollama pull <model-name> for whichever model is configured in settings.

Semantic search not working Pull the embedding model: ollama pull nomic-embed-text. Then trigger a reindex: Ctrl+Shift+P → OllamaDev: Re-index Workspace.

Completions are slow

  • Use a smaller model for completions (ollamaDev.completionModel)
  • Reduce ollamaDev.contextLines and ollamaDev.maxCompletionTokens
  • Verify Ollama is using your GPU: check ollama ps while a model is loaded

Chat stuck or unresponsive Click the Stop button in the toolbar. Check Ollama logs with journalctl -u ollama (Linux) or the Ollama app logs (Mac/Windows).

Agent making unwanted file changes Switch to Manual mode in the chat toolbar — the agent will show a diff preview and ask for approval before writing any file.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft