DeskAI — Intelligent Local Coding Assistant

100% offline AI coding assistant powered by local LLMs running entirely on your machine. No API keys. No data sent to the cloud. Full project generation, agentic file editing, and terminal control — all on your hardware.

Features

🤖 Agent Mode — Full Project Generation

Let the AI plan and build entire projects autonomously:

Creates files and folders directly on disk
Runs terminal commands (npm install, build, etc.)
Reads, edits, and verifies files
Recovers from errors automatically
Persistent task memory (scratchpad) survives context resets
"Continue task" button resumes interrupted sessions

💬 Ask Mode — Instant Code Help

Code explanation, refactoring, debugging
Ask about selected code via right-click → DeskAI: Ask About Selection
Attach files as context with the 📎 button

📎 File Attachment

Attach single files or entire folders as context
File content processed server-side — never floods the chat UI
Large files handled gracefully with truncation and read_file fallback

🔋 Model Power Toggle

One-click unload the model from RAM when you're done (saves 8–20 GB of memory)
Reload instantly without restarting VS Code
Power button in the chat header — green = loaded, dim = unloaded

🕘 Persistent Chat History

Every conversation is automatically saved to local storage
Reopen VS Code and restore any previous session with one click
History panel shows session titles and timestamps
Delete individual sessions or start fresh with Clear

🔍 Workspace RAG (Retrieval-Augmented Generation)

Automatically injects relevant workspace context into every query — no manual file attaching
Keyword-based retrieval scans up to 300 files across your project
Works entirely offline: no embeddings API, no cloud
Configurable depth (deskai.ragTopK) and can be toggled off

⚡ Local LLM Powered

Uses llama-server under the hood (bundled)
Supports GGUF models — import any compatible model
Recommended models auto-downloaded on first launch

Recommended Models

Model	RAM Required	Best For
Qwen2.5-Coder-14B Q4_K_M ⭐	12 GB+	Default — purpose-built for coding, fast and accurate
Qwen3-14B Q4_K_M	12 GB+	Strongest reasoning + coding, hybrid thinking mode
DeepSeek-Coder-V2-Lite 16B	12 GB+	MoE coder, great throughput on 12–15 GB machines

Models are downloaded automatically on first use. You can also Import any .gguf file via the model picker.

Getting Started

Install the extension
Open the DeskAI panel in the secondary sidebar (right side)
Wait for the model to load (first launch downloads the model ~8–16 GB)
Switch to Agent mode for full project generation
Type your task and press Enter

Showcase / Demo Mode

For full project generation without confirmation dialogs:

Enable deskai.autoApproveWrites: true — files written directly without popups
Enable deskai.hiddenTerminal: true — commands run silently in background

Settings

| Setting | Default | Description | |---------|---------|-------------| | deskai.autoApproveWrites | false | Write files directly without confirmation dialogs | | deskai.hiddenTerminal | false | Run shell commands silently in background | | deskai.maxAgentIterations | 40 | Max tool-call iterations before pausing | | deskai.contextSize | 32768 | Model context window in tokens | | deskai.temperature | 0.3 | Generation temperature (0=deterministic) | | deskai.modelPath | | Custom path to a `.gguf` model file | | `deskai.modelsSearchPaths` | `[]` | Extra directories to scan for models | | `deskai.llamaServerPath` | | Custom path to llama-server executable | | deskai.enableRag | true | Inject relevant workspace files into every query automatically | | deskai.ragTopK | 4 | Number of workspace chunks injected per query | | deskai.loraAdapterPath | `` | Path to a LoRA adapter file (.gguf/.bin) to load with the model |

Requirements

Windows 10/11 (primary), macOS, Linux
RAM: 12 GB minimum (16 GB+ recommended for best models)
Disk: 10–20 GB free for model storage
VS Code 1.85.0+

How It Works

On first open, DeskAI downloads llama-server and a GGUF model
llama-server runs locally on port 11434 with a 32k context window
The chat panel communicates with the local server via HTTP
In Agent mode, the model calls tools (write_file, run_terminal_command, etc.) to implement tasks
All data stays on your machine — nothing is sent to the internet during inference

Commands

Command	Description
`DeskAI: Open Chat`	Open the chat panel
`DeskAI: Ask About Selection`	Ask about selected code (right-click menu)
`DeskAI: Refresh Project Context`	Re-scan workspace for context
`DeskAI: Import Custom Model (.gguf)`	Import a local model file
`DeskAI: Stop Server`	Stop the llama-server process
`DeskAI: Open Debug Log`	View the extension debug log

Privacy

All inference is local — your code never leaves your machine
The only network requests are:
- Initial download of llama-server binary (from GitHub Releases)
- Initial download of the selected GGUF model (from Hugging Face)
After first setup, the extension works completely offline

License

MIT — see LICENSE

DeskAI — Local AI Coding Assistant

DeskAI