Silo — Local & Cloud AI Coding Assistant
Agentic coding assistant for VS Code. Runs fully local on Ollama or routes to major cloud and OpenAI-compatible AI providers. Your keys never leave your machine.
Current release: 1.4.1.

Why Silo
- Private by default — local Ollama backend. Nothing leaves your machine unless you pick a cloud provider.
- Two curated local models —
silo-qwen (Qwen 3 14B, full agentic) and silo-phi (Microsoft Phi-4 14B, fast chat).
- Auto model selector — toggle in toolbar picks Qwen for complex tasks (code, search, analysis) and Phi for short conversational replies.
- Live thinking stream — Qwen's reasoning tokens appear in real time in a collapsible Reasoning block. Toggle on/off in the model picker.
- Plan mode (Claude Code style) — Think → explore → ask clarifying questions with option buttons → numbered plan → todo checklist → "Switch to Auto" button.
- Slash commands with autocomplete — type
/ for a floating list of commands with arrow-key navigation.
- Markdown rendering — bold, italic, inline code, headings, lists, blockquotes, code blocks with copy button.
- Agentic — reads, writes, edits, runs shell commands, web search, git ops, tracks todos.
- Approval workflow — Ask mode shows a diff preview with Apply/Reject before touching files.
- Explicit context — use
@file, @folder, @codebase, and @docs mentions in chat.
- Checkpoints & checks — Auto/Edit creates checkpoint patches and runs detected project checks after edits.
- MCP-ready — connect external tools through
.silo/mcp.json.
- Multi-provider — pick a cloud model per chat. API keys are stored in VS Code's encrypted SecretStorage.
- Modes —
Ask (chat + approval), Plan (research + plan only), Auto (full access).
1.4 Highlights
- Backend accepts loopback clients only by default and CORS is restricted to VS Code/localhost origins.
- Tool outputs redact common API key, token, password, and bearer-token shapes before returning to the chat loop.
- Command, git, search, directory, and MCP outputs are clipped consistently to keep context small.
- Chat history can be emptied completely; the empty state renders once in the center of the history panel.
- Silo tools can access absolute filesystem paths outside the opened workspace when you provide or request those paths.
Requirements
Speed path
Simple prompts use an instant route: no project tree, no open files, no RAG, no diagnostics, no git diff, no tool schemas, and only a short history window. Code/edit/search prompts still get workspace context when it helps. Local responses can show latency metadata: model, fast/workspace route, first-token time, total time, and estimated prompt size.
Silo can also use local llama.cpp servers for no-tool local turns. Configure silo.llamacpp.serverPath and GGUF model paths once in VS Code settings and Silo will start the fast Phi llama-server automatically on startup. Qwen GGUF autostart is disabled by default to avoid VRAM contention; enable silo.llamacpp.qwenAutostart only if you explicitly want a second heavy server. If an accelerator is not configured or fails to start, Silo silently falls back to Ollama. Agentic/tool turns stay on Ollama for compatibility.
Use Silo: Configure llama.cpp / GGUF from the Command Palette to select llama-server.exe and a .gguf model without editing settings manually. Silo also auto-detects Phi and Qwen .gguf files in common folders such as C:\ALL\Models\Silo, workspace models/, and ~/.silo/models.
- VS Code
1.90+
- Python
3.10+ (for the local backend)
- Ollama if you want the local model path
- Optional: API keys for cloud providers
Setup
1. Install the extension
Search Silo in the VS Code Extensions panel, or install the .vsix directly.
2. Start the backend
git clone https://github.com/danielmadridg/silo.git
cd silo/backend
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
pip install fastapi "uvicorn[standard]" httpx pydantic sse-starlette aiofiles
uvicorn main:app --host 127.0.0.1 --port 8942
On Windows you can double-click start-backend.bat.
3. (Optional) Pull local models
Silo ships with two curated local models:
# Primary — Qwen 3 14B (fits 16 GB VRAM much better)
ollama pull qwen3:14b
ollama create silo-qwen -f backend/Modelfile-qwen
# Light — Microsoft Phi-4 14B (~40 tok/s, 84.8% benchmarks)
ollama pull phi4
ollama create silo-phi -f backend/Modelfile-phi
Both use tuned sampling params and model-specific context budgets for stable local latency.
Edit backend/config.py to change the default model.
4. (Optional) Add a cloud model
Open the Silo sidebar → click the model picker → + Add AI → pick OpenAI / Anthropic / Gemini, paste your API key.
Features
Modes
| Mode |
What it does |
| Ask |
Conversational plus approval previews for file edits. |
| Plan |
Read-only tools — read files, grep, list dirs. Great for exploration. |
| Auto |
Full agent — read, write, edit, run commands, manage todos. |
In Silo 1.4.1, Ask mode can also prepare file edits as approval previews. The UI shows a unified diff and waits for Apply or Reject. Auto/Edit mode keeps full access and applies changes directly.
Slash commands
Type / in the input to open an autocomplete picker. Arrow keys to navigate, Enter to confirm.
/clear — start a new chat
/compact — summarize the current chat to save context
/review [ref] — AI review of git diff vs ref (default HEAD~1)
/search <query> — web search via the agent
/export — save the current chat as Markdown
/mode ask|plan|auto — switch mode
/model [id] — switch model (or open picker)
/help — list all commands
Plan mode
Plan mode is a research-first workflow modeled on Claude Code:
- Think — model opens a
<think> block to reason about scope and ambiguities.
- Explore — uses read-only tools (
read_file, search_content, git_status, web_search) to understand the codebase.
- Clarify — calls
ask_user with a question and 2–4 concrete options. UI shows clickable option buttons + a free-text input.
- Plan — produces a numbered plan with files to touch, risks, and open questions.
- Todo — emits a checklist in the side panel.
- Hand off — ends with a "→ Switch to Auto mode to implement" button.
Auto-model selector
The model-routing button in the toolbar picks the right local model per message:
- Short / conversational / simple questions → Phi-4 (~40 tok/s)
- Code, search, analysis, refactor, multi-step tasks → Qwen 3 14B (full tool calling)
Auto-detection uses structural signals instead of language keyword lists: file paths, @ mentions, fenced code, diffs, stack traces, command-shaped lines, and prompt size. Ambiguous short prompts stay on the fast model.
You can disable Auto and pick a model manually from the model picker.
Thinking toggle
The Thinking switch at the bottom of the model picker controls Qwen's reasoning mode:
- ON — Qwen generates a
<think> block before responding. Streamed live to the Reasoning panel. Best for complex tasks.
- OFF — Qwen replies immediately without reasoning. Best for fast iteration on simple changes.
Phi-4 ignores this toggle (no thinking support).
Cloud providers
Bring your own key. Add AI includes compact presets for OpenAI, Anthropic Claude, Google Gemini, DeepSeek, xAI, Groq, Mistral, OpenRouter, Together AI, Fireworks, Perplexity, Cerebras, NVIDIA, Moonshot, Qwen, and any custom OpenAI-compatible endpoint.
Keys are stored in context.secrets (VS Code SecretStorage — OS keychain on Mac/Windows/Linux). Never written to disk as plain text.
Workspace context
Silo includes workspace context only when the prompt needs it. Simple questions stay on the instant route; code/edit/search prompts can include:
- The active file + selected/open tabs
- VS Code diagnostics (problems)
git diff (unstaged changes)
SILO.md or CLAUDE.md memory from your workspace root
- BM25 workspace snippets
You can also mention context explicitly:
@backend/tools.py includes a file
@backend includes a folder summary
@codebase asks Silo to lean on workspace retrieval
@docs tells Silo to use documentation/web tools when available
Persistent project instructions can live in SILO.md, CLAUDE.md, .silo/SILO.md, .silo/CLAUDE.md, or .silo/instructions.md.
Silo's automatic context focuses the current workspace for speed, but file tools also accept absolute paths. If you give Silo a path outside the opened workspace, it can inspect or edit it within the current Windows/VS Code permissions.
Checkpoints, checks, and MCP
Before Auto/Edit changes or approved Ask edits, Silo writes a checkpoint patch to .silo/checkpoints/. After Auto/Edit file changes, Silo detects common project checks and streams the result back into the chat so it can fix failures.
MCP servers can be configured in .silo/mcp.json:
{
"mcpServers": {
"example": {
"command": "node",
"args": ["path/to/server.js"],
"env": {}
}
}
}
Multi-chat history
The sidebar keeps your previous chats. Empty chats are discarded. Long chats auto-compact.
- Analyze file — full review of the active file
- Refactor selection — right-click → Silo: Refactor Selection
- Explain selection — right-click → Silo: Explain Selection
- Inline completions — Tab to accept
Configuration
| Setting |
Default |
Description |
silo.backendUrl |
http://127.0.0.1:8942 |
Backend URL (bind to loopback by default) |
silo.contextFiles |
5 |
Open files included in context |
silo.backendPath |
"" |
Path to the backend folder. Leave empty for auto-detect |
Security
- Backend binds to
127.0.0.1 by default and rejects non-loopback clients unless SILO_ALLOW_REMOTE=1 is explicitly set.
- CORS is restricted to VS Code webviews and localhost/127.0.0.1 origins.
- API keys live in VS Code SecretStorage (OS keychain).
- Tool outputs redact common API key, token, password, and bearer-token shapes before returning to the chat loop.
- No telemetry. No analytics. No outbound calls except to your configured provider.
- Source on GitHub — audit anything.
Source
github.com/danielmadridg/silo
License
MIT — see LICENSE.