Skip to content
| Marketplace
Sign in
Visual Studio Code>Programming Languages>LocalMindsNew to Visual Studio Code? Get it now.
LocalMinds

LocalMinds

LocalMinds — Local & Cloud AI Coding Assistant

| (0) | Free
Privacy-first AI coding assistant. Work offline with local Gemma 4 models or access 200+ cloud models (Claude, GPT, Gemini) with one API key.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

LocalMinds

Your privacy-first AI coding assistant that works offline.

LocalMinds brings powerful AI code generation directly into VS Code, with the flexibility to work completely offline using local models or access hundreds of cloud models through a single API key.

Why LocalMinds?

🔒 Privacy First — Your code never leaves your machine when using local models. No telemetry, no tracking, no data collection.

💰 Cost Effective — Run Gemma 4 locally for free, or pay pennies per request with OpenRouter when you need cloud power.

⚡ Zero Latency — Local inference means instant responses with no network delays.

🌐 Best of Both Worlds — Seamlessly switch between local Ollama models and 200+ cloud models (Claude, GPT, Gemini, Llama, Grok, and more) with one unified interface.

How It Works

  • Gemma 4 via Ollama — local code generation (free, private, zero latency)
  • OpenRouter — one key, every major cloud model (Claude, GPT, Gemini, Llama, Grok…)
User Request
     ↓
Ollama / Gemma 4        ← local, free, default
     ↓  (if cloud is enabled)
OpenRouter              ← Claude / GPT / Gemini / your pick
     ↓
Result → Editor

No separate Anthropic, OpenAI, or Moonshot keys. One OpenRouter key, one bill, every model.

Features

✨ Chat Panel — Interactive AI chat with streaming responses
🎯 Smart Context — Automatically includes relevant code from your workspace
⚙️ Customizable Commands — Generate, refactor, explain, improve code with right-click menus
⌨️ Keyboard Shortcuts — Quick access to inline edits and code generation
🎨 Agent Profile — Teach the AI your stack, preferences, and coding style
📜 Chat History — Review and continue previous conversations
🔄 Model Switching — Switch between local and cloud models on the fly
🚫 Offline Mode — Work completely offline with local models


Setup

Prefer a guided flow? Run LocalMinds: Open Setup Wizard from the Command Palette (Cmd+Shift+P) once installed.

1. Install Ollama

# macOS
brew install ollama          # or download from https://ollama.com

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# https://ollama.com/download

2. Pull Gemma 4

Model Size VRAM Best for
gemma4:e2b 7.2 GB ~6 GB Laptops
gemma4:e4b 9.6 GB ~8 GB Good balance (default)
gemma4:26b 18 GB ~20 GB Best for coding
gemma4:31b 20 GB ~24 GB Maximum capability
ollama pull gemma4:e4b

3. (Optional) Get an OpenRouter key for cloud models

One key unlocks Claude, GPT, Gemini, and hundreds more.

  1. Sign up at https://openrouter.ai
  2. Add a few dollars of credit
  3. Create a key at https://openrouter.ai/keys

Skip this step for fully offline operation.


Configure

Install the LocalMinds extension from the VS Code marketplace, then open Settings and search LocalMinds — or add directly to settings.json:

{
  "localminds.ollama.model": "gemma4:e4b",
  "localminds.openrouter.apiKey": "sk-or-v1-...",
  "localminds.openrouter.model": "anthropic/claude-sonnet-4",
  "localminds.openrouter.models": [
    "anthropic/claude-sonnet-4",
    "openai/gpt-4o",
    "google/gemini-2.5-pro"
  ]
}

Popular model IDs — see the full list at https://openrouter.ai/models:

Use case Model
Best all-round coder anthropic/claude-sonnet-4
Heavy reasoning anthropic/claude-opus-4
Cheap + fast openai/gpt-4o-mini
Long context google/gemini-2.5-pro
Open-weights meta-llama/llama-3.1-70b-instruct

Usage

Shortcuts

Shortcut Action
Cmd+Shift+G Generate code from description
Cmd+Shift+E Inline edit selected code

Right-click → LocalMinds

Generate, Refactor, Explain, Improve, Fix Bug, Inline Edit, Add Loading State, Make Responsive, Convert to Hooks.

Chat panel

Click the LocalMinds icon in the activity bar. Streams from whichever model is active. Title-bar buttons: New Chat, Chat History, Customise Agent Profile, Setup Wizard.

Agent Profile

Run LocalMinds: Customise Agent Profile to describe your stack, preferences, and style. The profile is appended to every system prompt so answers fit how you work.

Offline mode

Click the status bar item or set localminds.offlineMode: true — cloud calls are skipped entirely.


Recommended Hardware

Setup Model Experience
MacBook Air M1 (8GB) gemma4:e2b Usable, ~5-10s
MacBook Pro (16GB) gemma4:e4b Good, ~3-5s
MacBook Pro (32GB+) gemma4:26b Excellent
GPU (24GB+ VRAM) gemma4:26b / 31b Best local experience

Cloud models via OpenRouter have no local hardware requirements.


Troubleshooting

Ollama offline — run ollama serve, then check with curl http://localhost:11434/api/tags.

Model not found — run ollama list, then ollama pull gemma4:e4b.

OpenRouter errors

  • 401 — check localminds.openrouter.apiKey matches a key at openrouter.ai/keys
  • 402 — top up credit at openrouter.ai/credits
  • 429 — rate limited, retry or switch model
  • Model not found — IDs must match exactly (e.g. anthropic/claude-sonnet-4, not claude-sonnet-4)

Slow responses — try a smaller Gemma variant, reduce localminds.contextLines to 50, or switch to a cloud model.


License

MIT

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft