Silo — Local & Cloud AI Coding Assistant

Agentic coding assistant for VS Code. Runs fully local on Ollama or routes to major cloud and OpenAI-compatible AI providers. Your keys never leave your machine.

Current release: 1.4.1.

Silo

Why Silo

Private by default — local Ollama backend. Nothing leaves your machine unless you pick a cloud provider.
Two curated local models — silo-qwen (Qwen 3 14B, full agentic) and silo-phi (Microsoft Phi-4 14B, fast chat).
Auto model selector — toggle in toolbar picks Qwen for complex tasks (code, search, analysis) and Phi for short conversational replies.
Live thinking stream — Qwen's reasoning tokens appear in real time in a collapsible Reasoning block. Toggle on/off in the model picker.
Plan mode (Claude Code style) — Think → explore → ask clarifying questions with option buttons → numbered plan → todo checklist → "Switch to Auto" button.
Slash commands with autocomplete — type / for a floating list of commands with arrow-key navigation.
Markdown rendering — bold, italic, inline code, headings, lists, blockquotes, code blocks with copy button.
Agentic — reads, writes, edits, runs shell commands, web search, git ops, tracks todos.
Approval workflow — Ask mode shows a diff preview with Apply/Reject before touching files.
Explicit context — use @file, @folder, @codebase, and @docs mentions in chat.
Checkpoints & checks — Auto/Edit creates checkpoint patches and runs detected project checks after edits.
MCP-ready — connect external tools through .silo/mcp.json.
Multi-provider — pick a cloud model per chat. API keys are stored in VS Code's encrypted SecretStorage.
Modes — Ask (chat + approval), Plan (research + plan only), Auto (full access).

1.4 Highlights

Backend accepts loopback clients only by default and CORS is restricted to VS Code/localhost origins.
Tool outputs redact common API key, token, password, and bearer-token shapes before returning to the chat loop.
Command, git, search, directory, and MCP outputs are clipped consistently to keep context small.
Chat history can be emptied completely; the empty state renders once in the center of the history panel.
Silo tools can access absolute filesystem paths outside the opened workspace when you provide or request those paths.

Requirements

Speed path

Simple prompts use an instant route: no project tree, no open files, no RAG, no diagnostics, no git diff, no tool schemas, and only a short history window. Code/edit/search prompts still get workspace context when it helps. Local responses can show latency metadata: model, fast/workspace route, first-token time, total time, and estimated prompt size.

Silo can also use local llama.cpp servers for no-tool local turns. Configure silo.llamacpp.serverPath and GGUF model paths once in VS Code settings and Silo will start the fast Phi llama-server automatically on startup. Qwen GGUF autostart is disabled by default to avoid VRAM contention; enable silo.llamacpp.qwenAutostart only if you explicitly want a second heavy server. If an accelerator is not configured or fails to start, Silo silently falls back to Ollama. Agentic/tool turns stay on Ollama for compatibility.

Use Silo: Configure llama.cpp / GGUF from the Command Palette to select llama-server.exe and a .gguf model without editing settings manually. Silo also auto-detects Phi and Qwen .gguf files in common folders such as C:\ALL\Models\Silo, workspace models/, and ~/.silo/models.

VS Code 1.90+
Python 3.10+ (for the local backend)
Ollama if you want the local model path
Optional: API keys for cloud providers

Setup

1. Install the extension

Search Silo in the VS Code Extensions panel, or install the .vsix directly.

2. Start the backend

git clone https://github.com/danielmadridg/silo.git
cd silo/backend

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate

pip install fastapi "uvicorn[standard]" httpx pydantic sse-starlette aiofiles
uvicorn main:app --host 127.0.0.1 --port 8942

On Windows you can double-click start-backend.bat.

3. (Optional) Pull local models

Silo ships with two curated local models:

# Primary — Qwen 3 14B (fits 16 GB VRAM much better)
ollama pull qwen3:14b
ollama create silo-qwen -f backend/Modelfile-qwen

# Light — Microsoft Phi-4 14B (~40 tok/s, 84.8% benchmarks)
ollama pull phi4
ollama create silo-phi -f backend/Modelfile-phi

Both use tuned sampling params and model-specific context budgets for stable local latency.
Edit backend/config.py to change the default model.

4. (Optional) Add a cloud model

Open the Silo sidebar → click the model picker → + Add AI → pick OpenAI / Anthropic / Gemini, paste your API key.

Features

Modes

Mode	What it does
Ask	Conversational plus approval previews for file edits.
Plan	Read-only tools — read files, grep, list dirs. Great for exploration.
Auto	Full agent — read, write, edit, run commands, manage todos.

In Silo 1.4.1, Ask mode can also prepare file edits as approval previews. The UI shows a unified diff and waits for Apply or Reject. Auto/Edit mode keeps full access and applies changes directly.

Slash commands

Type / in the input to open an autocomplete picker. Arrow keys to navigate, Enter to confirm.

/clear — start a new chat
/compact — summarize the current chat to save context
/review [ref] — AI review of git diff vs ref (default HEAD~1)
/search <query> — web search via the agent
/export — save the current chat as Markdown
/mode ask|plan|auto — switch mode
/model [id] — switch model (or open picker)
/help — list all commands

Plan mode

Plan mode is a research-first workflow modeled on Claude Code:

Think — model opens a <think> block to reason about scope and ambiguities.
Explore — uses read-only tools (read_file, search_content, git_status, web_search) to understand the codebase.
Clarify — calls ask_user with a question and 2–4 concrete options. UI shows clickable option buttons + a free-text input.
Plan — produces a numbered plan with files to touch, risks, and open questions.
Todo — emits a checklist in the side panel.
Hand off — ends with a "→ Switch to Auto mode to implement" button.

Auto-model selector

The model-routing button in the toolbar picks the right local model per message:

Short / conversational / simple questions → Phi-4 (~40 tok/s)
Code, search, analysis, refactor, multi-step tasks → Qwen 3 14B (full tool calling)

Auto-detection uses structural signals instead of language keyword lists: file paths, @ mentions, fenced code, diffs, stack traces, command-shaped lines, and prompt size. Ambiguous short prompts stay on the fast model.

You can disable Auto and pick a model manually from the model picker.

Thinking toggle

The Thinking switch at the bottom of the model picker controls Qwen's reasoning mode:

ON — Qwen generates a <think> block before responding. Streamed live to the Reasoning panel. Best for complex tasks.
OFF — Qwen replies immediately without reasoning. Best for fast iteration on simple changes.

Phi-4 ignores this toggle (no thinking support).

Cloud providers

Bring your own key. Add AI includes compact presets for OpenAI, Anthropic Claude, Google Gemini, DeepSeek, xAI, Groq, Mistral, OpenRouter, Together AI, Fireworks, Perplexity, Cerebras, NVIDIA, Moonshot, Qwen, and any custom OpenAI-compatible endpoint.

Keys are stored in context.secrets (VS Code SecretStorage — OS keychain on Mac/Windows/Linux). Never written to disk as plain text.

Workspace context

Silo includes workspace context only when the prompt needs it. Simple questions stay on the instant route; code/edit/search prompts can include:

The active file + selected/open tabs
VS Code diagnostics (problems)
git diff (unstaged changes)
SILO.md or CLAUDE.md memory from your workspace root
BM25 workspace snippets

You can also mention context explicitly:

@backend/tools.py includes a file
@backend includes a folder summary
@codebase asks Silo to lean on workspace retrieval
@docs tells Silo to use documentation/web tools when available

Persistent project instructions can live in SILO.md, CLAUDE.md, .silo/SILO.md, .silo/CLAUDE.md, or .silo/instructions.md.

Silo's automatic context focuses the current workspace for speed, but file tools also accept absolute paths. If you give Silo a path outside the opened workspace, it can inspect or edit it within the current Windows/VS Code permissions.

Checkpoints, checks, and MCP

Before Auto/Edit changes or approved Ask edits, Silo writes a checkpoint patch to .silo/checkpoints/. After Auto/Edit file changes, Silo detects common project checks and streams the result back into the chat so it can fix failures.

MCP servers can be configured in .silo/mcp.json:

{
  "mcpServers": {
    "example": {
      "command": "node",
      "args": ["path/to/server.js"],
      "env": {}
    }
  }
}

Multi-chat history

The sidebar keeps your previous chats. Empty chats are discarded. Long chats auto-compact.

Inline tools

Analyze file — full review of the active file
Refactor selection — right-click → Silo: Refactor Selection
Explain selection — right-click → Silo: Explain Selection
Inline completions — Tab to accept

Configuration

Setting	Default	Description
`silo.backendUrl`	`http://127.0.0.1:8942`	Backend URL (bind to loopback by default)
`silo.contextFiles`	`5`	Open files included in context
`silo.backendPath`	`""`	Path to the backend folder. Leave empty for auto-detect

Security

Backend binds to 127.0.0.1 by default and rejects non-loopback clients unless SILO_ALLOW_REMOTE=1 is explicitly set.
CORS is restricted to VS Code webviews and localhost/127.0.0.1 origins.
API keys live in VS Code SecretStorage (OS keychain).
Tool outputs redact common API key, token, password, and bearer-token shapes before returning to the chat loop.
No telemetry. No analytics. No outbound calls except to your configured provider.
Source on GitHub — audit anything.

Source

github.com/danielmadridg/silo

License

MIT — see LICENSE.

Silo — Local & Cloud AI Coding Assistant

Daniel Madrid Garrabe

Silo — Local & Cloud AI Coding Assistant

Why Silo

1.4 Highlights

Requirements

Speed path

Setup

1. Install the extension

2. Start the backend

3. (Optional) Pull local models

4. (Optional) Add a cloud model

Features

Modes

Slash commands

Plan mode

Auto-model selector

Thinking toggle

Cloud providers

Workspace context

Checkpoints, checks, and MCP

Multi-chat history

Inline tools

Configuration

Security

Source

License