LocalMinds

Your privacy-first AI coding assistant that works offline.

LocalMinds brings powerful AI code generation directly into VS Code, with the flexibility to work completely offline using local models or access hundreds of cloud models through a single API key.

Why LocalMinds?

🔒 Privacy First — Your code never leaves your machine when using local models. No telemetry, no tracking, no data collection.

💰 Cost Effective — Run Gemma 4 locally for free, or pay pennies per request with OpenRouter when you need cloud power.

⚡ Zero Latency — Local inference means instant responses with no network delays.

🌐 Best of Both Worlds — Seamlessly switch between local Ollama models and 200+ cloud models (Claude, GPT, Gemini, Llama, Grok, and more) with one unified interface.

How It Works

Gemma 4 via Ollama — local code generation (free, private, zero latency)
OpenRouter — one key, every major cloud model (Claude, GPT, Gemini, Llama, Grok…)

User Request
     ↓
Ollama / Gemma 4        ← local, free, default
     ↓  (if cloud is enabled)
OpenRouter              ← Claude / GPT / Gemini / your pick
     ↓
Result → Editor

No separate Anthropic, OpenAI, or Moonshot keys. One OpenRouter key, one bill, every model.

Features

✨ Chat Panel — Interactive AI chat with streaming responses
🎯 Smart Context — Automatically includes relevant code from your workspace
⚙️ Customizable Commands — Generate, refactor, explain, improve code with right-click menus
⌨️ Keyboard Shortcuts — Quick access to inline edits and code generation
🎨 Agent Profile — Teach the AI your stack, preferences, and coding style
📜 Chat History — Review and continue previous conversations
🔄 Model Switching — Switch between local and cloud models on the fly
🚫 Offline Mode — Work completely offline with local models

Setup

Prefer a guided flow? Run LocalMinds: Open Setup Wizard from the Command Palette (Cmd+Shift+P) once installed.

1. Install Ollama

# macOS
brew install ollama          # or download from https://ollama.com

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# https://ollama.com/download

2. Pull Gemma 4

Model	Size	VRAM	Best for
`gemma4:e2b`	7.2 GB	~6 GB	Laptops
`gemma4:e4b`	9.6 GB	~8 GB	Good balance (default)
`gemma4:26b`	18 GB	~20 GB	Best for coding
`gemma4:31b`	20 GB	~24 GB	Maximum capability

ollama pull gemma4:e4b

3. (Optional) Get an OpenRouter key for cloud models

One key unlocks Claude, GPT, Gemini, and hundreds more.

Sign up at https://openrouter.ai
Add a few dollars of credit
Create a key at https://openrouter.ai/keys

Skip this step for fully offline operation.

Configure

Install the LocalMinds extension from the VS Code marketplace, then open Settings and search LocalMinds — or add directly to settings.json:

{
  "localminds.ollama.model": "gemma4:e4b",
  "localminds.openrouter.apiKey": "sk-or-v1-...",
  "localminds.openrouter.model": "anthropic/claude-sonnet-4",
  "localminds.openrouter.models": [
    "anthropic/claude-sonnet-4",
    "openai/gpt-4o",
    "google/gemini-2.5-pro"
  ]
}

Popular model IDs — see the full list at https://openrouter.ai/models:

Use case	Model
Best all-round coder	`anthropic/claude-sonnet-4`
Heavy reasoning	`anthropic/claude-opus-4`
Cheap + fast	`openai/gpt-4o-mini`
Long context	`google/gemini-2.5-pro`
Open-weights	`meta-llama/llama-3.1-70b-instruct`

Usage

Shortcuts

Shortcut	Action
`Cmd+Shift+G`	Generate code from description
`Cmd+Shift+E`	Inline edit selected code

Right-click → LocalMinds

Generate, Refactor, Explain, Improve, Fix Bug, Inline Edit, Add Loading State, Make Responsive, Convert to Hooks.

Chat panel

Click the LocalMinds icon in the activity bar. Streams from whichever model is active. Title-bar buttons: New Chat, Chat History, Customise Agent Profile, Setup Wizard.

Agent Profile

Run LocalMinds: Customise Agent Profile to describe your stack, preferences, and style. The profile is appended to every system prompt so answers fit how you work.

Offline mode

Click the status bar item or set localminds.offlineMode: true — cloud calls are skipped entirely.

Recommended Hardware

Setup	Model	Experience
MacBook Air M1 (8GB)	`gemma4:e2b`	Usable, ~5-10s
MacBook Pro (16GB)	`gemma4:e4b`	Good, ~3-5s
MacBook Pro (32GB+)	`gemma4:26b`	Excellent
GPU (24GB+ VRAM)	`gemma4:26b` / `31b`	Best local experience

Cloud models via OpenRouter have no local hardware requirements.

Troubleshooting

Ollama offline — run ollama serve, then check with curl http://localhost:11434/api/tags.

Model not found — run ollama list, then ollama pull gemma4:e4b.

OpenRouter errors

401 — check localminds.openrouter.apiKey matches a key at openrouter.ai/keys
402 — top up credit at openrouter.ai/credits
429 — rate limited, retry or switch model
Model not found — IDs must match exactly (e.g. anthropic/claude-sonnet-4, not claude-sonnet-4)

Slow responses — try a smaller Gemma variant, reduce localminds.contextLines to 50, or switch to a cloud model.

License

MIT

LocalMinds

LocalMinds — Local & Cloud AI Coding Assistant

LocalMinds

Why LocalMinds?

How It Works

Features

Setup

1. Install Ollama

2. Pull Gemma 4

3. (Optional) Get an OpenRouter key for cloud models

Configure

Usage

Shortcuts

Right-click → LocalMinds

Chat panel

Agent Profile

Offline mode

Recommended Hardware

Troubleshooting

License