Ollama Free Coder

A fully-local LLM coding assistant powered by Ollama. Nothing leaves your machine.

This repository ships two sister plugins that share the same ideas:

Editor	Source	Installers	Docs
VS Code	`src/`	`scripts/install-ubuntu.sh`, `scripts/install-macos.sh`, `scripts/install-windows.ps1`	this README, `DOCUMENTATION.md`, `ARCHITECTURE.md`
Vim / Neovim	`vim/`	`vim/scripts/install-ubuntu.sh`, `vim/scripts/install-macos.sh`, `vim/scripts/install-windows.ps1`	`vim/README.md` + `:help ollama-coder`

All six installers share the same environment-variable contract: CHAT_MODEL, COMPLETION_MODEL, EXTRA_MODELS, OLLAMA_HOST, SKIP_OLLAMA, SKIP_PULL. They each ensure the Ollama server is running, pull the configured models (failures are fatal, never silent), and verify them in /api/tags before installing the plugin.

If you don’t set CHAT_MODEL / COMPLETION_MODEL, the installer picks sensible defaults based on detected RAM:

Total RAM	Tier	Chat model	Completion model
< 6 GB	tiny	`llama3.2:3b`	`qwen2.5-coder:0.5b-base`
< 12 GB	small	`llama3.1:8b`	`qwen2.5-coder:1.5b-base`
< 20 GB	medium	`qwen2.5:14b`	`qwen2.5-coder:1.5b-base`
< 40 GB	large	`qwen2.5:32b`	`qwen2.5-coder:7b-base`
≥ 40 GB	huge	`llama3.3:70b`	`qwen2.5-coder:7b-base`

Override at any time:
CHAT_MODEL=qwen2.5:14b ./scripts/install-ubuntu.sh. The detection logic lives in scripts/pick-models.sh and scripts/pick-models.ps1 (same tier table), and prints ==> Detected N GB RAM (tier: X) -> ... so the choice is visible.

The rest of this README is about the VS Code extension.

Ollama Free Coder for VS Code

Features

Inline ghost-text completion as you type, using a FIM-capable model (defaults to qwen2.5-coder:1.5b-base). Debounced and cancellable.
Command history — every prompt you send is persisted. Press ↑/↓ in the chat input to walk through past commands shell-style, click History for a list view, or hover any past user message in the chat log for one-click resend and edit.
Keyboard-driven model picker — click the Model button (or focus it with Tab) and use ↑/↓ to cycle models, Home/End to jump to first/last, PageUp/PageDown to skip 5, Enter to select, Esc to cancel. Mouse still works. The textarea’s own ↑/↓ history walk is independent — the keys never collide because they live on different focus targets.
Chat sidebar (Activity Bar → robot icon) with streaming responses, a one-click "include current file/selection" toggle, and @mentions:
- @src/foo.ts — attach a workspace file's contents to your question.
- @selection — attach the current editor selection (or whole active file).
Show-on-screen vs. write-to-file routing — prompts like “show me a C++ Hello World”, “explain Vector class in C++”, “what is a class in Python” always stream into the chat panel. Prompts like “add Vector class implementation to test.cpp” or “write a new file with Python fizzbuzz” are auto-routed through agent mode and create the file (with a confirm dialog). You can override either way by toggling agent mode.
Web search built in — three ways to use it:
1. Type /search QUERY (or /web, /google) in the chat input to run a web search directly and render the top hits inline. No LLM involved.
2. Prompts with a web-search intent ("google the latest TypeScript", "what's new in Rust 1.85", "search the web for free SVG icons") are auto-grounded: results are fetched and prepended as context before the LLM answers.
3. The agent can call web_search itself in agent mode. A small Search: DuckDuckGo (or Google) label in the chat input shows which backend is active.
Apply code blocks — hover any code block in chat for one-click Insert at cursor, Replace selection, Save… (with diff preview if the target file exists), and Copy. If the assistant emits a fence like ```ts src/foo.ts, Save… pre-fills that path.
Agent mode (checkbox in the chat input) lets the model call workspace tools to answer multi-step questions and apply edits:
- read_file, list_files, search_text — read-only context gathering.
- get_open_editors — see what's open and what's selected.
- web_search — search the public web. Default backend is DuckDuckGo (free, no key, no signup). If you set ollamaCoder.googleApiKey and ollamaCoder.googleCseId, the tool switches to Google Custom Search JSON API (Google's free tier: 100 queries/day).
- write_file — create / overwrite files (always shows a confirm dialog; "Show diff first" opens a side-by-side preview before applying). Uses Ollama's native tool calling — works well with llama3.1:8b, qwen2.5:7b, qwen2.5-coder:7b, and other tool-capable models.
Code actions on selection (right-click → Ollama Free Coder):
- Explain Selection
- Refactor Selection (replaces selection)
- Fix Selection (replaces selection)
- Add Docstrings / Comments (replaces selection)
- Generate Unit Tests (opens new editor)
- Ask About Selection… (free-form question)
Status bar model switcher — click the Ollama: <model> badge to swap between any model you've pulled (ollama pull ...).
Zero runtime npm deps — uses Node's built-in http for the Ollama API.

Default keybindings

Action	Shortcut
Open chat	`Ctrl+Alt+O`
Explain selection	`Ctrl+Alt+E`
Refactor selection	`Ctrl+Alt+R`

Settings

Setting	Default	Description
`ollamaCoder.endpoint`	`http://localhost:11434`	Ollama server URL
`ollamaCoder.chatModel`	`llama3.1:8b`	Model for chat & code actions
`ollamaCoder.completionModel`	`qwen2.5-coder:1.5b-base`	Model for inline completion (FIM-capable recommended)
`ollamaCoder.enableInlineCompletion`	`true`	Toggle ghost-text completion
`ollamaCoder.completionDebounceMs`	`250`	Delay before a completion request
`ollamaCoder.maxCompletionTokens`	`128`	Max tokens per completion
`ollamaCoder.temperature`	`0.2`	Sampling temperature
`ollamaCoder.contextWindowChars`	`4000`	Max chars of file context sent to the model
`ollamaCoder.agentMaxSteps`	`8`	Max tool-calling steps per agent turn
`ollamaCoder.searchBackend`	`duckduckgo`	`duckduckgo` (free, no key) or `google` (requires the two keys below)
`ollamaCoder.googleApiKey`	`""`	Optional Google API key for Custom Search JSON API (free 100/day)
`ollamaCoder.googleCseId`	`""`	Optional Google Programmable Search Engine id

Quick install on Ubuntu (24.04 / 26.04)

git clone https://github.com/local/ollama-coder-vscode
cd ollama-coder-vscode
./scripts/install-ubuntu.sh

The script will:

Check for Node.js ≥ 18 (and install via apt if missing).
Install Ollama (via the official installer) and systemctl enable --now ollama.
Pull the two default models (llama3.1:8b, qwen2.5-coder:1.5b-base).
npm install, compile TypeScript, package a .vsix, and install it into the code CLI.

Useful environment variables:

CODE_BIN=codium ./scripts/install-ubuntu.sh   # install into VSCodium instead of VS Code
SKIP_OLLAMA=1   ./scripts/install-ubuntu.sh   # skip Ollama install
SKIP_PULL=1     ./scripts/install-ubuntu.sh   # skip pulling models
CHAT_MODEL=qwen2.5-coder:7b-instruct ./scripts/install-ubuntu.sh

Manual build

npm install
npm run compile
npx vsce package -o ollama-free-coder.vsix
code --install-extension ./ollama-free-coder.vsix --force

Development

npm install
npm run watch       # incremental compile
# In VS Code: F5 to launch an Extension Development Host.

Source layout:

src/
  extension.ts           # activation, commands, status bar
  ollama.ts              # streaming HTTP client for /api/generate, /api/chat, /api/tags
  completionProvider.ts  # InlineCompletionItemProvider with FIM (prompt + suffix)
  codeActions.ts         # Explain / Refactor / Fix / Docstrings / Tests / Ask
  chatView.ts            # Webview-based chat sidebar
scripts/
  install-ubuntu.sh      # one-shot build + install for Ubuntu

Model recommendations

Use case	Suggested model	`ollama pull`
Inline completion (fast, FIM)	`qwen2.5-coder:1.5b-base`	`ollama pull qwen2.5-coder:1.5b-base`
Inline completion (better, FIM)	`qwen2.5-coder:7b-base`	`ollama pull qwen2.5-coder:7b-base`
Chat / code actions (small)	`llama3.1:8b`	`ollama pull llama3.1:8b`
Chat / code actions (better)	`qwen2.5-coder:7b-instruct`	`ollama pull qwen2.5-coder:7b-instruct`

The inline completion provider sends both a prompt (prefix) and suffix to /api/generate, which Ollama forwards to the model's fill-in-the-middle template. Use a *-base coder model for best results — instruct/chat models tend to wrap output in prose.

License

Apache-2.0 (see LICENSE).

Ollama Free Coder

Den Raskovalov

Ollama Free Coder

Ollama Free Coder for VS Code

Features

Default keybindings

Settings

Quick install on Ubuntu (24.04 / 26.04)

Manual build

Development

Model recommendations

License