LocalMinds
Your privacy-first AI coding assistant that works offline.
LocalMinds brings powerful AI code generation directly into VS Code, with the flexibility to work completely offline using local models or access hundreds of cloud models through a single API key.
Why LocalMinds?
🔒 Privacy First — Your code never leaves your machine when using local models. No telemetry, no tracking, no data collection.
💰 Cost Effective — Run Gemma 4 locally for free, or pay pennies per request with OpenRouter when you need cloud power.
⚡ Zero Latency — Local inference means instant responses with no network delays.
🌐 Best of Both Worlds — Seamlessly switch between local Ollama models and 200+ cloud models (Claude, GPT, Gemini, Llama, Grok, and more) with one unified interface.
How It Works
- Gemma 4 via Ollama — local code generation (free, private, zero latency)
- OpenRouter — one key, every major cloud model (Claude, GPT, Gemini, Llama, Grok…)
User Request
↓
Ollama / Gemma 4 ← local, free, default
↓ (if cloud is enabled)
OpenRouter ← Claude / GPT / Gemini / your pick
↓
Result → Editor
No separate Anthropic, OpenAI, or Moonshot keys. One OpenRouter key, one bill, every model.
Features
✨ Chat Panel — Interactive AI chat with streaming responses
🎯 Smart Context — Automatically includes relevant code from your workspace
⚙️ Customizable Commands — Generate, refactor, explain, improve code with right-click menus
⌨️ Keyboard Shortcuts — Quick access to inline edits and code generation
🎨 Agent Profile — Teach the AI your stack, preferences, and coding style
📜 Chat History — Review and continue previous conversations
🔄 Model Switching — Switch between local and cloud models on the fly
🚫 Offline Mode — Work completely offline with local models
Setup
Prefer a guided flow? Run LocalMinds: Open Setup Wizard from the Command Palette (Cmd+Shift+P) once installed.
1. Install Ollama
# macOS
brew install ollama # or download from https://ollama.com
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows
# https://ollama.com/download
2. Pull Gemma 4
| Model |
Size |
VRAM |
Best for |
gemma4:e2b |
7.2 GB |
~6 GB |
Laptops |
gemma4:e4b |
9.6 GB |
~8 GB |
Good balance (default) |
gemma4:26b |
18 GB |
~20 GB |
Best for coding |
gemma4:31b |
20 GB |
~24 GB |
Maximum capability |
ollama pull gemma4:e4b
3. (Optional) Get an OpenRouter key for cloud models
One key unlocks Claude, GPT, Gemini, and hundreds more.
- Sign up at https://openrouter.ai
- Add a few dollars of credit
- Create a key at https://openrouter.ai/keys
Skip this step for fully offline operation.
Install the LocalMinds extension from the VS Code marketplace, then open Settings and search LocalMinds — or add directly to settings.json:
{
"localminds.ollama.model": "gemma4:e4b",
"localminds.openrouter.apiKey": "sk-or-v1-...",
"localminds.openrouter.model": "anthropic/claude-sonnet-4",
"localminds.openrouter.models": [
"anthropic/claude-sonnet-4",
"openai/gpt-4o",
"google/gemini-2.5-pro"
]
}
Popular model IDs — see the full list at https://openrouter.ai/models:
| Use case |
Model |
| Best all-round coder |
anthropic/claude-sonnet-4 |
| Heavy reasoning |
anthropic/claude-opus-4 |
| Cheap + fast |
openai/gpt-4o-mini |
| Long context |
google/gemini-2.5-pro |
| Open-weights |
meta-llama/llama-3.1-70b-instruct |
Usage
Shortcuts
| Shortcut |
Action |
Cmd+Shift+G |
Generate code from description |
Cmd+Shift+E |
Inline edit selected code |
Right-click → LocalMinds
Generate, Refactor, Explain, Improve, Fix Bug, Inline Edit, Add Loading State, Make Responsive, Convert to Hooks.
Chat panel
Click the LocalMinds icon in the activity bar. Streams from whichever model is active. Title-bar buttons: New Chat, Chat History, Customise Agent Profile, Setup Wizard.
Agent Profile
Run LocalMinds: Customise Agent Profile to describe your stack, preferences, and style. The profile is appended to every system prompt so answers fit how you work.
Offline mode
Click the status bar item or set localminds.offlineMode: true — cloud calls are skipped entirely.
Recommended Hardware
| Setup |
Model |
Experience |
| MacBook Air M1 (8GB) |
gemma4:e2b |
Usable, ~5-10s |
| MacBook Pro (16GB) |
gemma4:e4b |
Good, ~3-5s |
| MacBook Pro (32GB+) |
gemma4:26b |
Excellent |
| GPU (24GB+ VRAM) |
gemma4:26b / 31b |
Best local experience |
Cloud models via OpenRouter have no local hardware requirements.
Troubleshooting
Ollama offline — run ollama serve, then check with curl http://localhost:11434/api/tags.
Model not found — run ollama list, then ollama pull gemma4:e4b.
OpenRouter errors
401 — check localminds.openrouter.apiKey matches a key at openrouter.ai/keys
402 — top up credit at openrouter.ai/credits
429 — rate limited, retry or switch model
- Model not found — IDs must match exactly (e.g.
anthropic/claude-sonnet-4, not claude-sonnet-4)
Slow responses — try a smaller Gemma variant, reduce localminds.contextLines to 50, or switch to a cloud model.
License
MIT