Overview Version History Q & A Rating & Review
DeskAI — Intelligent Local Coding Assistant
100% offline AI coding assistant powered by local LLMs running entirely on your machine. No API keys. No data sent to the cloud. Full project generation, agentic file editing, and terminal control — all on your hardware.
Features
🤖 Agent Mode — Full Project Generation
Let the AI plan and build entire projects autonomously:
Creates files and folders directly on disk
Runs terminal commands (npm install, build, etc.)
Reads, edits, and verifies files
Recovers from errors automatically
Persistent task memory (scratchpad) survives context resets
"Continue task" button resumes interrupted sessions
💬 Ask Mode — Instant Code Help
Code explanation, refactoring, debugging
Ask about selected code via right-click → DeskAI: Ask About Selection
Attach files as context with the 📎 button
📎 File Attachment
Attach single files or entire folders as context
File content processed server-side — never floods the chat UI
Large files handled gracefully with truncation and read_file fallback
⚡ Local LLM Powered
Uses llama-server under the hood (bundled)
Supports GGUF models — import any compatible model
Recommended models auto-downloaded on first launch
Recommended Models
Model
RAM Required
Best For
Qwen3-27B Q4_K_M ⭐
20 GB+
Full project generation, best quality
DeepSeek-Coder-V2-Lite 16B
12 GB+
Great for 15GB RAM machines
Qwen2.5-Coder-14B Q4_K_M
12 GB+
Solid coding baseline
Models are downloaded automatically on first use. You can also Import any .gguf file via the model picker.
Getting Started
Install the extension
Open the DeskAI panel in the secondary sidebar (right side)
Wait for the model to load (first launch downloads the model ~8–16 GB)
Switch to Agent mode for full project generation
Type your task and press Enter
Showcase / Demo Mode
For full project generation without confirmation dialogs:
Enable deskai.autoApproveWrites: true — files written directly without popups
Enable deskai.hiddenTerminal: true — commands run silently in background
Settings
| Setting | Default | Description |
|---------|---------|-------------|
| deskai.autoApproveWrites | false | Write files directly without confirmation dialogs |
| deskai.hiddenTerminal | false | Run shell commands silently in background |
| deskai.maxAgentIterations | 40 | Max tool-call iterations before pausing |
| deskai.contextSize | 32768 | Model context window in tokens |
| deskai.temperature | 0.3 | Generation temperature (0=deterministic) |
| deskai.modelPath | | Custom path to a `.gguf` model file | | `deskai.modelsSearchPaths` | `[]` | Extra directories to scan for models | | `deskai.llamaServerPath` | | Custom path to llama-server executable |
Requirements
Windows 10/11 (primary), macOS, Linux
RAM : 12 GB minimum (16 GB+ recommended for best models)
Disk : 10–20 GB free for model storage
VS Code 1.85.0+
How It Works
On first open, DeskAI downloads llama-server and a GGUF model
llama-server runs locally on port 11434 with a 32k context window
The chat panel communicates with the local server via HTTP
In Agent mode, the model calls tools (write_file, run_terminal_command, etc.) to implement tasks
All data stays on your machine — nothing is sent to the internet during inference
Commands
Command
Description
DeskAI: Open Chat
Open the chat panel
DeskAI: Ask About Selection
Ask about selected code (right-click menu)
DeskAI: Refresh Project Context
Re-scan workspace for context
DeskAI: Import Custom Model (.gguf)
Import a local model file
DeskAI: Stop Server
Stop the llama-server process
DeskAI: Open Debug Log
View the extension debug log
Privacy
All inference is local — your code never leaves your machine
The only network requests are:
Initial download of llama-server binary (from GitHub Releases)
Initial download of the selected GGUF model (from Hugging Face)
After first setup, the extension works completely offline
License
MIT — see LICENSE