OllamaPilotA fully local, offline AI coding assistant for VS Code — powered by OllamaNo cloud. No subscriptions. No telemetry. Just you, your code, and a local AI.
✨ What is OllamaPilot?OllamaPilot is a free, open-source VS Code extension that brings a Cursor-like AI coding assistant experience directly into VS Code — running entirely on your machine using Ollama.
📋 Table of Contents
📦 PrerequisitesBefore installing the extension you need Ollama running on your machine. 1. Install Ollama
2. Pull a model
3. Start Ollama
🚀 InstallationOption A — Install from VS Code Marketplace (recommended)One-click install: ➡ Install OllamaPilot on the VS Code Marketplace Or via Quick Open:
Option B — Install from
|
| Tool | Description | Confirmation Required |
|---|---|---|
workspace_summary |
Full project tree, type detection, key files, recently modified | — |
read_file |
Read any file in the workspace | — |
list_files |
List directory contents | — |
search_files |
Search for text across all workspace files | — |
create_file |
Create a new file with content | — |
edit_file |
Targeted patch edit (old → new string) with VS Code diff preview | ✅ |
write_file |
Overwrite a file entirely | ✅ |
append_to_file |
Append text to an existing file | — |
rename_file |
Rename or move a file | ✅ |
delete_file |
Delete a file | ✅ |
run_command |
Execute shell commands with live output streaming | ✅ |
memory_list |
Recall all saved project notes for this workspace | — |
memory_write |
Save a persistent note (fact, decision, convention) about the project | — |
memory_delete |
Delete a saved note by id | — |
Safety: All file modifications and command executions require explicit confirmation via a VS Code dialog. Paths are validated to stay within the workspace root. Dangerous command patterns (
rm -rf /,mkfs, etc.) are blocked before the confirmation dialog even appears.
Example agent workflow
User: "@src/api/user.ts Refactor fetchUser to handle errors properly"
Agent: → memory_list() (recall any prior project notes)
→ read_file("src/api/user.ts") (read the attached file)
→ edit_file(...) (show diff → user approves)
→ memory_write("fetchUser now (save the decision for future sessions)
uses Result<T>")
← "Done. I updated fetchUser to use try/catch and return a Result type..."
📎 @File Mentions
@mentions let you attach any file from your workspace directly in the chat input — similar to Cursor's @ feature.
How to use:
- Type
@anywhere in the message input - A fuzzy-search dropdown appears with matching files
- Type more characters to filter, use
↑↓to navigate - Press
Enter,Tab, or click to attach the file - The file appears as a pill in the context bar
- Remove it with
×at any time before sending
What happens when you send:
- The file content is read on the extension side (not the webview)
- It is attached as a structured
<mention>block after your message - Large files are automatically capped at 100 KB
- If the same file is already auto-attached (via the file toggle), it won't be duplicated
🧠 Project Memory
Project memory lets the AI save and recall notes about your workspace — persisted across all sessions, scoped per workspace folder.
The AI uses three tools to manage memory automatically:
| Tool | What it does |
|---|---|
memory_list |
Reads all saved notes at the start of a conversation |
memory_write |
Saves a note with optional tag (e.g. architecture, bug, decision) |
memory_delete |
Removes a stale or incorrect note by its id |
Example use cases:
"Remember that we use Prisma ORM, not raw SQL"
→ Agent saves: tag=architecture "this project uses Prisma ORM, not raw SQL"
"What do you know about this project?"
→ Agent calls memory_list() and summarises saved notes
"Forget the note about the old API endpoint"
→ Agent calls memory_delete(id)
Notes are stored in VS Code's workspaceState — automatically scoped to the current workspace folder, never committed to git, and never shared between workspaces.
🔀 Git Diff Context
When enabled, OllamaPilot automatically injects a summary of your uncommitted changes into every message — giving the AI awareness of what you're currently working on without you having to explain it.
Enable it:
// .vscode/settings.json (or via Settings UI)
{
"ollamaAgent.injectGitDiff": true
}
What gets injected:
git diff(unstaged changes) +git diff --cached(staged changes)- A one-line stat summary (e.g. "3 files changed, 42 insertions, 7 deletions")
- Automatically truncated at 8 KB to protect your context window
- Gracefully skipped if the folder is not a git repo or git is unavailable
Example prompt with git diff:
User: "Why is the login test failing?"
Agent receives:
Your message +
<git-diff summary="2 files changed, 15 insertions, 3 deletions">
diff --git a/src/auth/login.ts ...
</git-diff>
🕐 Chat History
Every conversation is automatically saved to VS Code's global storage after each assistant response.
| Action | How |
|---|---|
| Open history | Click the 🕐 button in the chat header |
| Load a session | Click any session in the list |
| Delete a session | Hover over a session → click 🗑 |
| Delete all | Open history → click "Delete all" |
| New chat | Click + in the chat header, or use Cmd+Shift+O |
When you load a session, both the visual chat and the conversation context (the model's memory) are fully restored.
Sessions are stored in VS Code's global state (not in your filesystem) so they are not committed to git.
⚙️ Configuration / Settings
Open Settings (Ctrl+, / Cmd+,) and search for "Ollama" to see all options.
| Setting | Default | Description |
|---|---|---|
ollamaAgent.baseUrl |
"" |
Full Ollama URL, e.g. http://localhost:11434. Overrides host + port when set. Useful for remote Ollama instances. |
ollamaAgent.host |
localhost |
Ollama hostname (used only when baseUrl is empty) |
ollamaAgent.port |
11434 |
Ollama port (used only when baseUrl is empty) |
ollamaAgent.model |
llama2 |
Default model at startup. Overridable per session via the dropdown. |
ollamaAgent.temperature |
0.7 |
Sampling temperature. 0 = deterministic, 1 = balanced, 2 = creative |
ollamaAgent.systemPrompt |
"" |
Custom system prompt. Leave empty to use the built-in coding assistant prompt. |
ollamaAgent.autoIncludeFile |
false |
Auto-attach the active file's full content to every message |
ollamaAgent.autoIncludeSelection |
true |
Auto-attach selected code when a selection exists |
ollamaAgent.maxContextFiles |
5 |
Maximum number of workspace files to auto-load as context |
ollamaAgent.injectGitDiff |
false |
Inject uncommitted git diff into every message for change-aware context |
Example: Using a remote Ollama instance
// .vscode/settings.json
{
"ollamaAgent.baseUrl": "http://192.168.1.100:11434",
"ollamaAgent.model": "qwen2.5-coder:7b"
}
Example: Custom system prompt with git diff
{
"ollamaAgent.systemPrompt": "You are a senior TypeScript engineer. Always write strict types. Prefer functional patterns.",
"ollamaAgent.injectGitDiff": true
}
⌨️ Keyboard Shortcuts
| Shortcut | Action |
|---|---|
Cmd+Shift+O / Ctrl+Shift+O |
Open the OllamaPilot chat panel |
Enter |
Send message |
Shift+Enter |
New line in the message input |
@ |
Trigger file mention autocomplete |
↑ / ↓ |
Navigate the @mention dropdown |
Tab / Enter |
Select highlighted @mention |
Escape |
Dismiss the @mention dropdown |
🔍 How It Works
┌─────────────────────────────────────────────────────────┐
│ VS Code Extension │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ Webview │◄──►│ Provider │◄──►│ Agent │ │
│ │ (Chat UI) │ │ (Msg Router) │ │ (Loop) │ │
│ └──────────────┘ └──────────────┘ └──────────┘ │
│ │ │ │ │
│ │ │ ┌─────▼──────┐│
│ │ │ │ Tools ││
│ │ │ │ read_file ││
│ │ │ │ edit_file ││
│ │ │ │ memory_* ││
│ │ │ └────────────┘│
│ │ ┌──────▼──────┐ │
│ │ │ Storage │ │
│ │ │ globalState │ │
│ │ │workspaceState │
│ │ └─────────────┘ │
└─────────┼───────────────────────────────────────────────┘
│
▼ HTTP (localhost)
┌─────────────────────┐
│ Ollama Server │
│ localhost:11434 │
│ │
│ llama3 / qwen / │
│ phi / mistral / │
│ codellama / ... │
└─────────────────────┘
Architecture
The extension is built in clean TypeScript modules:
src/
├── main.ts Entry point — activate() registers commands and the sidebar provider
├── provider.ts WebviewViewProvider — routes messages, manages sessions, resolves @mentions
├── agent.ts Agent loop — multi-turn tool calling, mode switching, history
├── ollamaClient.ts HTTP client for Ollama API (/api/chat, /api/tags)
├── chatStorage.ts Session persistence via vscode.ExtensionContext.globalState
├── projectMemory.ts Workspace-scoped project notes via vscode.ExtensionContext.workspaceState
├── mentions.ts Workspace file indexer and fuzzy search for @mention autocomplete
├── gitContext.ts Git diff extraction — staged + unstaged changes, auto-truncated
├── config.ts Configuration reader (maps VS Code settings → typed OllamaConfig)
├── context.ts Active file / selection extraction from the editor
├── workspace.ts Project scanner — file tree, project type, key files, recent files
└── logger.ts Shared OutputChannel logger
webview/
├── webview.html Chat panel UI — VS Code theme variables, no frameworks
├── webview.js Frontend logic — streaming, markdown, @mentions, token counter, history
└── vendor/
└── highlight.bundle.js Vendored offline highlight.js (30+ languages, no CDN)
scripts/
└── vendor-hljs.js Build script — generates the highlight.js browser bundle
Message flow
- User types a message (optionally with
@filementions) and presses Enter - Webview sends
{ command: 'sendMessage', text, model, includeFile, includeSelection, mentionedFiles }to the extension - Provider resolves @mentions (reads file content), builds context string, and optionally injects git diff
Agent.run()sends the full conversation to Ollama via streaming/api/chat- Each token is forwarded to the webview as
{ type: 'token', text } - When the model emits a tool call, the agent executes it (with confirmation if needed) and loops
- On
streamEnd, the provider saves the complete assistant message to the current session
Tool calling — two modes
| Mode | When used | How |
|---|---|---|
| Native | Models that support Ollama tool calling (llama3-groq-tool-use, qwen2.5-coder, etc.) | Tools passed as JSON schema in the API request |
| Text (fallback) | All other models (llama2, phi, mistral, etc.) | Tool instructions injected into the system prompt; the model emits <tool>{"name":...}</tool> blocks which are parsed client-side |
The switch happens automatically on the first HTTP 400 — does not support tools error, with no interruption to the user experience.
📡 Supported Models
Any model available in Ollama works with this extension. Models known to work well:
| Model | Size | Notes |
|---|---|---|
qwen2.5-coder:7b |
~4 GB | ⭐ Recommended — excellent at coding, native tools, large context |
qwen2.5-coder:1.5b |
~1 GB | Fastest, good for quick tasks |
llama3.1:8b |
~5 GB | General purpose, high quality |
phi3:mini |
~2 GB | Very fast, good for simple tasks |
codellama:7b |
~4 GB | Specialized for code generation |
mistral:7b |
~4 GB | Well-rounded, good reasoning |
deepseek-coder:6.7b |
~4 GB | Strong at code tasks |
deepseek-r1:8b |
~5 GB | Reasoning model, excellent for complex refactors |
llama2 |
~4 GB | Classic, text-mode fallback activated automatically |
Token limits: The token counter in the footer automatically adapts to the known context window of your selected model. If your model is not in the built-in list, a safe default of 8 192 tokens is assumed.
Pull any model with:
ollama pull <model-name>
🤝 Contributing
Contributions are warmly welcome! Whether it's a bug fix, a new feature, better documentation, or a UX improvement — all pull requests are reviewed.
Ways to contribute
- 🐛 Report bugs — open an Issue with reproduction steps
- 💡 Suggest features — open an Issue with the
enhancementlabel - 🔧 Submit a PR — see Development Setup below
- 📖 Improve docs — typos, clarity, missing info
- 🌍 Share — star the repo, mention it to other developers
Code of Conduct
Be kind, be constructive. We follow the Contributor Covenant.
🛠️ Development Setup
Requirements
- Node.js 18+
- npm 9+
- VS Code 1.80+
- Ollama running locally
Clone and build
# 1. Clone the repo
git clone https://github.com/kchikech/Ollama_Agent.git
cd Ollama_Agent
# 2. Install dev dependencies
npm install
# 3. Compile TypeScript + generate vendor bundle
npm run build
# 4. Package as .vsix for testing
npx vsce package
npm run buildruns two steps:npm run vendor(generateswebview/vendor/highlight.bundle.js) thentsc.
Run in development mode
- Open the
Ollama_Agentfolder in VS Code - Press
F5— this opens a new Extension Development Host window with the extension loaded - Make changes →
npm run build→ reload the Extension Development Host (Ctrl+R)
Project structure
| Path | Purpose |
|---|---|
src/ |
All TypeScript source — compiled to dist/ |
webview/ |
Frontend HTML + vanilla JS — inlined into the webview at runtime |
webview/vendor/ |
Vendored offline highlight.js bundle (generated, gitignored) |
scripts/ |
Build utilities (vendor-hljs.js) |
images/ |
Extension icons and demo GIF |
dist/ |
Compiled output (gitignored) |
.vscodeignore |
Files excluded from the .vsix package |
Adding a new agent tool
- Add the tool definition to
TOOL_DEFINITIONSinsrc/agent.ts - Add the execution case in
Agent.executeTool() - Add an icon to
TOOL_ICONSinwebview/webview.js - Update
TEXT_MODE_TOOL_INSTRUCTIONSinsrc/agent.ts - Update the system prompt in
DEFAULT_SYSTEM_PROMPTinsrc/agent.ts - Add the tool to this README's Agent Tools table
Running diagnostics
Use the built-in diagnostic command to verify your Ollama connection:
Ctrl+Shift+P → "Ollama: Run Diagnostics"
This tests HTTP connectivity, lists models, and runs a streaming test. Output appears in Output → Ollama Agent.
🗺️ Roadmap
v0.1.0 — Enhanced Context ✅ current
- [x]
@filenamemention in the prompt to attach specific files - [x] Token count indicator showing context size before sending
- [x] Offline syntax highlighting (highlight.js, 30+ languages)
- [x]
git diffcontext injection (opt-in via setting) - [x] Persistent project memory / notes (per-workspace)
v0.2.0 — UX Polish
- [ ] Export chat as Markdown
- [ ] Message search within a session
- [ ] Configurable keyboard shortcut
- [ ] Extension icon and Marketplace banner image
v0.3.0 — Code Intelligence
- [ ] Inline diff application directly in the editor
- [ ] Multi-workspace folder support
- [ ]
@symbolmention to attach a specific function or class
v1.0.0 — Stability
- [ ] Comprehensive test suite
- [ ] Memory UI panel (browse/edit notes without the agent)
- [ ] Export / import project memory
Have a feature idea? Open an issue — community feedback drives the roadmap.
❓ FAQ
Q: Does it work without internet? A: Yes. Once Ollama is installed and a model is pulled, the extension works completely offline. Syntax highlighting also runs fully offline — no CDN requests. No data is ever sent to any external server.
Q: How do @file mentions work?
A: Type @ in the input box, start typing a filename, and select from the dropdown. The file content is read on the extension side and attached to your message as context. Files are capped at 100 KB to protect your context window.
Q: What is project memory? Is it the same as chat history? A: No. Chat history saves the full conversation transcript. Project memory is a separate, persistent notes store that the AI can read and write to using tools — it survives even when you start a new chat. Use it for project conventions, known bugs, architecture decisions, etc.
Q: My model doesn't support tools — will agent features still work? A: Yes. The extension automatically detects when a model doesn't support native tool calling and switches to a text-mode fallback where tool instructions are embedded in the system prompt. You'll see an amber notice in the chat when this happens.
Q: Can I use a remote Ollama instance?
A: Yes. Set ollamaAgent.baseUrl to the remote URL (e.g. http://192.168.1.100:11434). HTTPS is also supported.
Q: Where are my chats stored?
A: In VS Code's global extension storage (globalState). They are not stored in your filesystem or committed to git. Use the "Delete all" button in the history panel to remove them.
Q: Where is project memory stored?
A: In VS Code's workspace-scoped storage (workspaceState) — automatically isolated per workspace folder, not in your filesystem, not committed to git.
Q: The token counter seems off — is it accurate? A: It's an approximation using the 4 characters ≈ 1 token heuristic, which is standard for English and code. It won't be exact (exact tokenisation requires the model's tokenizer), but it's close enough to warn you before you hit context limits.
Q: Can the AI modify my files without asking?
A: No. Every file write, edit, rename, and delete requires you to click Apply / Write / Delete in a confirmation dialog. Command execution (run_command) also requires explicit confirmation.
Q: The extension shows "Ollama not running". What do I do?
A: Run ollama serve in a terminal. On macOS you can also start it from the menu bar icon.
🔒 Privacy
This extension is designed with privacy as a first-class concern:
- ✅ All processing happens locally on your machine
- ✅ No analytics, telemetry, or usage tracking of any kind
- ✅ No network requests except to your local Ollama instance
- ✅ Chat history and project memory stored only in VS Code's local extension storage
- ✅ No API keys or accounts required
- ✅ Open source — audit the code yourself
📄 License
MIT License — see LICENSE for details.
🙏 Acknowledgements
- Ollama — for making local LLMs accessible to everyone
- highlight.js — for the offline syntax highlighting engine
- VS Code Extension API — for the powerful webview and workspace APIs
- All contributors and early adopters who helped shape this extension
Made with ❤️ by the open-source community
Report a Bug · Request a Feature · Contribute
⭐ Star this repo if you find it useful!
