OllamaDev

A free, fully local AI coding assistant for VS Code powered by Ollama. All inference runs on your machine — no API keys, no subscriptions, no data leaves your computer.

Requirements

Ollama installed and running (ollama serve)
At least one model pulled (see Recommended Models)
VS Code 1.85+

Quick Start

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama serve

# 2. Pull a model (best all-round default)
ollama pull qwen2.5-coder:7b

# 3. Pull the embedding model (for semantic search + RAG completions)
ollama pull nomic-embed-text

Open the OllamaDev panel from the Activity Bar or press Ctrl+Shift+O.

Features

Agentic Chat

A full-featured AI chat sidebar that can read files, search the codebase, run commands, browse the web, and make code edits — all autonomously.

The agent runs in a loop: it calls tools until the task is complete, then delivers a final response. You can watch every tool call in real time with expandable result cards.

Modes: | Mode | What it does | |---|---| | Auto | Agent executes all tool calls without asking | | Manual | Shows a diff preview and asks for approval before writing files | | Plan | Produces a numbered plan before doing anything — you review before execution |

Effort levels (toolbar):

Fast — focused, low temperature, quick answers
Balanced — default for everyday tasks
Deep — higher temperature, more thorough exploration

Tools (38 built-in)

The agent can invoke any of these autonomously:

Category	Tools
Files	`read_file`, `write_file`, `edit_file`, `read_multiple_files`, `copy_file`, `move_file`, `delete_file`, `create_directory`, `get_file_info`, `get_file_tree`, `get_active_file`, `reveal_file`
Search	`search_workspace`, `find_symbol`, `search_and_replace`, `semantic_search`
Git	`get_git_status`, `git_log`, `git_diff`, `git_commit`, `git_create_branch`, `get_pr_diff`
Web	`web_search`, `web_fetch`
Workspace	`list_directory`, `get_workspace_info`, `get_package_info`, `get_diagnostics`, `get_system_info`, `list_env_vars`, `run_command`
Memory	`write_memory`, `read_memory`, `delete_memory`
Project Memory	`read_project_memory`, `write_project_memory`, `delete_project_memory`
Testing	`run_tests`

Persistent Memory

OllamaDev has two layers of memory:

Session memory — the AI learns from every conversation. After each response, a background reflection pass asks the model what's worth remembering — workspace facts, preferences, patterns. These persist across sessions and are injected into every system prompt automatically.

Project memory (.ollama-memory.md) — a Markdown file at the workspace root that stores project-level facts: architecture decisions, conventions, domain knowledge. It's human-readable, committable to git, and shared across your team. The AI reads it automatically on every message and can write to it via write_project_memory.

Manage project memory:

Ctrl+Shift+P → OllamaDev: View / Edit Project Memory — opens .ollama-memory.md
Ctrl+Shift+P → OllamaDev: Add Project Memory Fact — interactive key/value prompt
Chat: "Remember that we use Zod for validation" → agent calls write_project_memory

RAG-Powered Inline Completions

Inline completions are powered by your semantic codebase index — not just the lines around your cursor. Before every completion, OllamaDev:

Builds a rich search query from the current line, enclosing function signature, and nearby intent comments
Retrieves the most relevant code from across your repo using vector similarity
Injects those references directly into the FIM prompt

The result: completions reference your APIs, types, and patterns instead of generic guesses. Requires nomic-embed-text and the workspace to be indexed.

Press Ctrl+Alt+G (or run OllamaDev: Navigate To…) and describe what you're looking for in plain English:

"where do we validate user input"
"find the auth token refresh handler"
"database connection setup"

OllamaDev searches the semantic index and shows a quick-pick list with file paths and relevance scores. Selecting a result opens the file and scrolls to the best matching line.

Requires a built semantic index (nomic-embed-text + at least one reindex).

Per-File-Type Model Routing

Configure different models for different file types via ollamaDev.modelRouting:

"ollamaDev.modelRouting": {
  "*.ts": "qwen2.5-coder:7b",
  "*.py": "codellama:7b",
  "*.sql": "sqlcoder:7b",
  "*.md": "qwen3:8b"
}

When the active file matches a pattern, that model is used for inline completions instead of the default completionModel. Unmatched files fall back to completionModel.

The status bar tooltip shows which model is active for the current file. Run OllamaDev: Show Active Model Routing Rules to see all configured rules.

Multi-Model Comparison

Press Ctrl+Shift+C (or run OllamaDev: Compare Two Models Side by Side) to open a split panel that runs the same prompt through two models simultaneously.

Select Model A and Model B from dropdowns (all locally installed models)
Type a prompt and press Ctrl+Enter or click Compare
Both models stream in parallel — responses appear side by side in real time
Token count, tokens/second, and total time shown per side
Thinking blocks from reasoning models shown inline

Use this to evaluate which model is better for a task, or to compare a large vs small model's output for the same question.

Semantic Search (RAG)

The extension automatically indexes your codebase using Ollama embeddings. The semantic_search tool lets the agent find code conceptually — "find where authentication happens", "which files handle errors" — instead of just text matching.

The index rebuilds automatically when you save files. Trigger a manual reindex: Ctrl+Shift+P → OllamaDev: Re-index Workspace.

Requires nomic-embed-text:

ollama pull nomic-embed-text

Inline Completions

Ghost-text completions as you type, accepted with Tab.

Uses Fill-in-the-Middle (FIM) — the model sees both prefix and suffix for better accuracy
Context-aware: function bodies, intent comments, arguments, imports each get tuned token budgets and temperatures
RAG-enhanced: relevant code from your repo is injected into every prompt
Manual trigger: Ctrl+Alt+Space
Toggle on/off: Ctrl+Shift+P → OllamaDev: Toggle Inline Completions

Recommended completion models (speed / quality on 8 GB GPU):

Model	Speed	First token	Quality
`qwen2.5-coder:1.5b`	~160 t/s	~25ms	Good
`qwen2.5-coder:3b`	~105 t/s	~40ms	Good
`qwen2.5-coder:7b` (recommended)	~62 t/s	~90ms	Excellent
`qwen2.5-coder:14b`	~32 t/s	~175ms	Excellent

Important: Only FIM-capable code models work for completions. Chat/reasoning models (qwen3, llama3, etc.) generate prose instead of code in FIM mode. OllamaDev warns you if you set a non-FIM model as the completion model.

Ctrl+K Inline Edit

Press Ctrl+K anywhere in the editor to open an inline prompt bar. Describe the change you want — OllamaDev streams the edit directly into the file with a green highlight. Press Enter to keep, Escape to revert.

Select code, right-click, and choose from the OllamaDev group:

Action	What it does
Explain Selected Code	Plain-language explanation with context
Generate Tests	Unit tests with edge cases
Fix / Improve Selection	Bug fixes and best-practice improvements
Add Documentation	JSDoc / docstrings / inline comments

Code Lens

TODO and FIXME comments get an inline ◆ Implement button. Click it to generate the function body directly in the file with a live streaming preview and Keep/Revert buttons.

Error diagnostics get a ⚡ Fix button that applies a surgical fix with the same Keep/Revert UX.

Thinking Mode

For reasoning models (qwen3, DeepSeek-R1, QwQ), enable Think in the toolbar. The model's internal reasoning appears as a collapsible block above the response.

PR Review

Press Ctrl+Shift+R to trigger a full structured code review of your current branch vs main/master. The agent fetches the diff and produces a report covering summary, bugs, security, performance, code quality, tests, and verdict.

Run Tests (AI-assisted)

Press Ctrl+Shift+T to run your test suite. If tests fail, the agent reads the relevant source files and fixes the failures automatically, re-running until they pass.

Supports: Jest, Vitest, Mocha, pytest, Cargo test, Go test, and Makefile targets.

Model Manager

Press Ctrl+Shift+M to open the Model Manager. Features:

Pull any Ollama model with a progress bar
Delete models from disk
See VRAM usage and which models are loaded
For Completions tab — filtered list of FIM code models with speed, quality, and first-token latency ratings
Set the active completion and chat model with one click
Warning when setting a non-FIM model as the completion model

Conversation Checkpoints

Save the current conversation state and restore it later — useful for branching explorations.

Context Attachment

Attach files and editor selections directly to your message using the File and Sel buttons. Drag-and-drop images for vision-capable models.

Recommended Models

Use case	Model	Pull command
Chat (8 GB VRAM)	`qwen2.5-coder:7b`	`ollama pull qwen2.5-coder:7b`
Chat (16 GB)	`qwen3:14b`	`ollama pull qwen3:14b`
Chat (24 GB)	`qwen2.5-coder:32b`	`ollama pull qwen2.5-coder:32b`
Completions (fast)	`qwen2.5-coder:1.5b`	`ollama pull qwen2.5-coder:1.5b`
Completions (best)	`qwen2.5-coder:7b`	`ollama pull qwen2.5-coder:7b`
Semantic search	`nomic-embed-text`	`ollama pull nomic-embed-text`

Settings Reference

Setting	Default	Description
`ollamaDev.ollamaUrl`	`http://localhost:11434`	Ollama server URL
`ollamaDev.chatModel`	`qwen2.5-coder:7b`	Model for chat and agent loops
`ollamaDev.completionModel`	`qwen2.5-coder:7b`	Model for inline completions (FIM code models only)
`ollamaDev.fastChatModel`	(empty)	Optional fast model for simple questions
`ollamaDev.embeddingModel`	`nomic-embed-text`	Model for semantic search indexing
`ollamaDev.modelRouting`	`{}`	Per-file-type model routing: `{ "*.py": "codellama:7b" }`
`ollamaDev.ragEnabled`	`true`	Enable semantic codebase indexing
`ollamaDev.inlineCompletionsEnabled`	`true`	Toggle ghost-text completions
`ollamaDev.completionDebounceMs`	`45`	Delay before firing a completion request
`ollamaDev.maxCompletionTokens`	`80`	Max tokens per inline completion
`ollamaDev.contextLines`	`50`	Lines of prefix context sent with completions
`ollamaDev.fillInMiddle`	`true`	Use FIM prompting when cursor is mid-file

Keyboard Shortcuts

Shortcut	Action
`Ctrl+Shift+O` / `Cmd+Shift+O`	Open chat panel
`Ctrl+Shift+M` / `Cmd+Shift+M`	Open Model Manager
`Ctrl+Shift+R` / `Cmd+Shift+R`	Review PR / branch changes
`Ctrl+Shift+T` / `Cmd+Shift+T`	Run tests (AI-assisted)
`Ctrl+Shift+H` / `Cmd+Shift+H`	Explain symbol under cursor
`Ctrl+Shift+G` / `Cmd+Shift+G`	Generate shell command
`Ctrl+Shift+C` / `Cmd+Shift+C`	Compare two models side by side
`Ctrl+Alt+G` / `Cmd+Alt+G`	Navigate to… (natural language semantic search)
`Ctrl+Alt+Space`	Manually trigger inline completion
`Ctrl+K` / `Cmd+K`	Inline edit at cursor
`Tab`	Accept inline suggestion / apply next edit
`Escape`	Dismiss inline suggestion / revert inline edit

Troubleshooting

"Cannot reach Ollama" Make sure Ollama is running: ollama serve. Check ollamaDev.ollamaUrl if you're on a non-default host/port.

"Model not found" Run ollama pull <model-name> for whichever model is configured in settings.

Completions show prose instead of code You're using a chat or reasoning model as the completion model. Open the Model Manager → For Completions tab and switch to a FIM code model like qwen2.5-coder:7b.

Semantic search / RAG completions not working Pull the embedding model: ollama pull nomic-embed-text. Then trigger a reindex: Ctrl+Shift+P → OllamaDev: Re-index Workspace. Semantic features require at least one indexed file.

Natural language navigation returns no results The workspace hasn't been indexed yet. Run Ctrl+Shift+P → OllamaDev: Re-index Workspace and wait for the status bar to confirm.

Completions are slow

Use a smaller model via ollamaDev.completionModel or ollamaDev.modelRouting
Open Model Manager → For Completions tab to see speed ratings
Verify Ollama is using your GPU: ollama ps

Chat stuck or unresponsive Click Stop in the toolbar. Check Ollama logs with journalctl -u ollama (Linux).

Agent making unwanted file changes Switch to Manual mode in the chat toolbar — the agent will ask for approval before writing any file.

Model routing not taking effect Make sure the pattern matches exactly: *.ts (not .ts or ts). Run OllamaDev: Show Active Model Routing Rules to confirm your rules are loaded.

OllamaDev

Prem Jampuram

OllamaDev

Requirements

Quick Start

Features

Agentic Chat

Tools (38 built-in)

Persistent Memory

RAG-Powered Inline Completions

Natural Language Navigation

Per-File-Type Model Routing

Multi-Model Comparison

Semantic Search (RAG)

Inline Completions

Ctrl+K Inline Edit

Code Actions (Right-click Menu)

Code Lens

Thinking Mode

PR Review

Run Tests (AI-assisted)

Model Manager

Conversation Checkpoints

Context Attachment

Recommended Models

Settings Reference

Keyboard Shortcuts

Troubleshooting