OllamaDev
A free, fully local AI coding assistant for VS Code powered by Ollama. All inference runs on your machine — no API keys, no subscriptions, no data leaves your computer.
Requirements
Quick Start
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama serve
# 2. Pull a model
ollama pull qwen2.5-coder:7b
# 3. Install the extension
code --install-extension ollama-dev-1.0.0.vsix
Open the OllamaDev panel from the Activity Bar (or press Ctrl+Shift+O).
Features
Agentic Chat
A full-featured AI chat sidebar that can read files, search the codebase, run commands, browse the web, and make code edits — all autonomously.
The agent runs in a loop: it calls tools until the task is complete, then delivers a final response. You can watch every tool call in real time with expandable result cards.
Modes:
| Mode | What it does |
|---|---|
| Auto | Agent executes all tool calls without asking |
| Manual | Shows a diff preview and asks for approval before writing files |
| Plan | Produces a numbered plan before doing anything — you review before execution |
Effort levels (toolbar):
- Fast — focused, low temperature, quick answers
- Balanced — default for everyday tasks
- Deep — higher temperature, more thorough exploration
The agent can invoke any of these autonomously:
| Category |
Tools |
| Files |
read_file, write_file, edit_file, read_multiple_files, copy_file, move_file, delete_file, create_directory, get_file_info, get_file_tree, get_active_file, reveal_file |
| Search |
search_workspace, find_symbol, search_and_replace, semantic_search |
| Git |
get_git_status, git_log, git_diff, git_commit, git_create_branch, get_pr_diff |
| Web |
web_search, web_fetch |
| Workspace |
list_directory, get_workspace_info, get_package_info, get_diagnostics, get_system_info, list_env_vars, run_command |
| Memory |
write_memory, read_memory, delete_memory |
| Testing |
run_tests |
Persistent Memory (Brain)
OllamaDev learns from every conversation. After each response, a background reflection pass asks the model what's worth remembering — workspace facts, your preferences, patterns that worked or failed. These memories persist across sessions and are injected into every system prompt.
The model decides what to remember. No hardcoded rules.
When memory grows large, a compaction pass automatically merges and prunes entries. You can also read, write, and delete memories explicitly via the read_memory, write_memory, and delete_memory tools.
Semantic Search (RAG)
The extension automatically indexes your codebase using Ollama embeddings (nomic-embed-text by default). The semantic_search tool uses this index for conceptual queries — "find where authentication happens", "which files handle database connections" — instead of just text matching.
The index rebuilds automatically when you save files. To trigger a manual reindex: Ctrl+Shift+P → OllamaDev: Re-index Workspace.
Requires nomic-embed-text:
ollama pull nomic-embed-text
Inline Completions
Ghost-text completions as you type, accepted with Tab.
- Uses Fill-in-the-Middle (FIM) when cursor is mid-file — the model sees both prefix and suffix for better accuracy
- Debounced to avoid slowing you down
- Manual trigger:
Ctrl+Alt+Space
- Toggle:
Ctrl+Shift+P → OllamaDev: Toggle Inline Completions
Select code, right-click, and choose from the OllamaDev group:
| Action |
What it does |
| Explain Selected Code |
Plain-language explanation with context |
| Generate Tests |
Unit tests with edge cases |
| Fix / Improve Selection |
Bug fixes and best-practice improvements |
| Add Documentation |
JSDoc / docstrings / inline comments |
Results stream directly into the chat panel.
Code Lens
TODO and FIXME comments get an inline ◆ Implement button. Click it to send the surrounding function context to the agent for automatic implementation.
Thinking Mode
For models that support reasoning (qwen3, DeepSeek-R1, QwQ), enable Think in the toolbar. The model's internal reasoning process appears as a collapsible block above the response — fully visible but out of the way.
Inline <think>...</think> tokens from models without native server-side thinking support are automatically intercepted and routed to the reasoning UI.
Multi-model Routing
Configure a fast small model for quick questions alongside a larger model for agent work:
"ollamaDev.chatModel": "qwen2.5-coder:14b",
"ollamaDev.fastChatModel": "qwen2.5:1.5b"
Simple questions with no action verbs and no tools needed automatically route to the fast model.
PR Review
Press Ctrl+Shift+R (or run OllamaDev: Review PR / Branch Changes) to trigger a full structured code review of your current branch vs main/master. The agent fetches the diff and produces a report covering summary, bugs, security, performance, code quality, tests, and verdict.
Run Tests (AI-assisted)
Press Ctrl+Shift+T (or run OllamaDev: Run Tests) to run your test suite. If tests fail, the agent reads the relevant source files and fixes the failures automatically, re-running until they pass.
Supports: Jest, Vitest, Mocha, pytest, Cargo test, Go test, and Makefile targets.
Conversation Checkpoints
Save the current conversation state at any point and restore it later. Useful for branching explorations — try one approach, checkpoint, try another, restore if needed.
Context Attachment
Attach files and editor selections directly to your message using the File and Sel buttons in the chat toolbar. Drag-and-drop images for vision-capable models.
Vision
Attach screenshots or diagrams to your message. Works with any vision-capable model (LLaVA, Qwen-VL, Gemma Vision, etc.).
Model Manager
Press Ctrl+Shift+M to open the Model Manager panel. Pull new models, delete old ones, see model sizes and parameter counts, and switch the active model for chat and completions.
Recommended Models
| RAM |
Chat model |
Pull command |
| 8 GB |
qwen2.5-coder:7b (default) |
ollama pull qwen2.5-coder:7b |
| 16 GB |
qwen3:14b |
ollama pull qwen3:14b |
| 24 GB |
qwen2.5-coder:32b |
ollama pull qwen2.5-coder:32b |
| 32 GB+ |
qwen3:32b |
ollama pull qwen3:32b |
For inline completions, use a smaller/faster model:
"ollamaDev.completionModel": "qwen2.5-coder:1.5b"
For semantic search (required for RAG):
ollama pull nomic-embed-text
Settings Reference
| Setting |
Default |
Description |
ollamaDev.ollamaUrl |
http://localhost:11434 |
Ollama server URL |
ollamaDev.chatModel |
qwen2.5-coder:7b |
Model for chat and agent loops |
ollamaDev.completionModel |
qwen2.5-coder:7b |
Model for inline completions |
ollamaDev.fastChatModel |
(empty) |
Optional fast model for simple questions |
ollamaDev.embeddingModel |
nomic-embed-text |
Model for semantic search indexing |
ollamaDev.ragEnabled |
true |
Enable semantic codebase indexing |
ollamaDev.inlineCompletionsEnabled |
true |
Toggle ghost-text completions |
ollamaDev.completionDebounceMs |
180 |
Delay before firing a completion request |
ollamaDev.maxCompletionTokens |
80 |
Max tokens per inline completion |
ollamaDev.contextLines |
50 |
Lines of prefix context sent with completions |
ollamaDev.fillInMiddle |
true |
Use FIM prompting when cursor is mid-file |
Keyboard Shortcuts
| Shortcut |
Action |
Ctrl+Shift+O / Cmd+Shift+O |
Open chat panel |
Ctrl+Shift+M / Cmd+Shift+M |
Open Model Manager |
Ctrl+Shift+R / Cmd+Shift+R |
Review PR / branch changes |
Ctrl+Shift+T / Cmd+Shift+T |
Run tests (AI-assisted) |
Ctrl+Shift+H / Cmd+Shift+H |
Explain symbol under cursor |
Ctrl+Alt+Space |
Manually trigger inline completion |
Tab |
Accept inline suggestion |
Escape |
Dismiss inline suggestion |
Troubleshooting
"Cannot reach Ollama"
Make sure Ollama is running: ollama serve. Check ollamaDev.ollamaUrl if you're on a non-default host or port.
"Model not found"
Run ollama pull <model-name> for whichever model is configured in settings.
Semantic search not working
Pull the embedding model: ollama pull nomic-embed-text. Then trigger a reindex: Ctrl+Shift+P → OllamaDev: Re-index Workspace.
Completions are slow
- Use a smaller model for completions (
ollamaDev.completionModel)
- Reduce
ollamaDev.contextLines and ollamaDev.maxCompletionTokens
- Verify Ollama is using your GPU: check
ollama ps while a model is loaded
Chat stuck or unresponsive
Click the Stop button in the toolbar. Check Ollama logs with journalctl -u ollama (Linux) or the Ollama app logs (Mac/Windows).
Agent making unwanted file changes
Switch to Manual mode in the chat toolbar — the agent will show a diff preview and ask for approval before writing any file.