Shimmy VS Code Extension

The official VS Code extension for Shimmy: The 5MB Alternative to Ollama.

Features

One-click server management: Start, stop, and restart Shimmy from the status bar
Instant code generation: Right-click any text selection to generate code completions
Auto-discovery: Works with any GGUF model, with zero-friction LoRA serving
Zero configuration: Smart defaults that just work
Status monitoring: Real-time server status in the status bar

Install Shimmy binary: cargo install shimmy --features llama
Install this extension
Set your model path in settings (or use SHIMMY_BASE_GGUF environment variable)
Click the status bar to start the server
Right-click selected code → "Generate from Selection"

Setting	Description	Default
`shimmy.serverUrl`	Shimmy server URL	`http://localhost:11435`
`shimmy.binaryPath`	Path to shimmy binary	`shimmy`
`shimmy.modelPath`	Path to GGUF model file	(uses `SHIMMY_BASE_GGUF` env var)
`shimmy.autoStart`	Auto-start server on activation	`false`

Shimmy provides OpenAI-compatible endpoints, so it works with:

Server won't start?

No code generation?

Performance tips:

Shimmy is the 5MB alternative to Ollama - a single-binary local inference server that:

Privacy-first, cost-free, blazing fast local AI.

MIT License - see Shimmy repository for details.