LLM Local Assistant - VS Code Extension
A VS Code extension that integrates with your local LLM (Ollama, LM Studio, vLLM) to provide intelligent code assistance, autonomous file operations, and chat capabilities directly in your editor.
📚 Contributing: See CONTRIBUTING.md for development guide.
✨ Features
- 🤖 Local LLM Chat - Chat with your local LLM without sending data to external servers
- 🔄 Agent Mode Commands - Autonomous file operations:
/read <path> - Read files from your workspace
/write <path> [prompt] - Generate content and write to files via LLM
/suggestwrite <path> [prompt] - LLM suggests changes, you approve before writing
- ⚙️ Fully Configurable - Customize endpoint, model, temperature, max tokens, timeout
- 💬 Conversation Context - Maintains chat history for coherent multi-turn conversations
- 🚀 Quick Access - Open chat with a single click from the status bar
- 🔒 100% Private - All processing stays on your machine
- ⚡ Streaming Support - Real-time token streaming for responsive UX
- ✅ Production-Ready - Comprehensive error handling, type safety, test coverage
📸 Screenshots
Chat Interface with Git Integration Commands

Chat window showing /git-commit-msg and /git-review commands in action. The interface displays:
- Interactive chat messages with streaming responses
- Git integration commands for autonomous commit message generation and code review
- Light gray text styling for optimal readability in dark themes
- Real-time command execution with status feedback
📊 Project Status
v1.0.0 - First Stable Release
- ✅ 23 commits - Clean, atomic git history showing full development progression
- ✅ 92 tests - 100% passing (36 extension + 33 llmClient + 23 gitClient)
- ✅ TypeScript strict mode - 0 type errors, full type safety
- ✅ 4 core modules - extension, llmClient, gitClient, webviewContent
- ✅ Published to VS Code Marketplace - v1.0.0 stable release
- ✅ Production-Ready - Comprehensive error handling and documentation
Features included:
- Chat interface with streaming support
- File operations (
/read, /write, /suggestwrite)
- Git integration (
/git-commit-msg, /git-review)
- Performance optimizations (token buffering, DOM batching)
- Monochrome UI with WCAG AA accessibility
- Comprehensive error handling
Ready for:
- Portfolio showcase - professional-grade code
- Production use - tested and optimized
- Extension by others - clear architecture and test coverage
- Interview discussion - full git history and talking points
📋 Prerequisites
Local LLM Server (Required)
You need one of:
Ollama (Recommended)
ollama run mistral
# Server at: http://localhost:11434
LM Studio
vLLM
python -m vllm.entrypoints.openai.api_server \
--model mistral-7b-instruct-v0.2 \
--port 11434
🚀 Getting Started
Quick Install (One Command)
From VS Code Marketplace (Easiest):
code --install-extension odanree.llm-local-assistant
Or search for "LLM Local Assistant" in VS Code Extensions marketplace: https://marketplace.visualstudio.com/items?itemName=odanree.llm-local-assistant
See docs/INSTALL.md for detailed platform-specific setup, troubleshooting, and development instructions.
Option A: Install from VS Code Marketplace (Recommended)
- Open VS Code Extensions (
Ctrl+Shift+X)
- Search for "LLM Local Assistant"
- Click "Install"
- Reload VS Code
Option B: Install from VSIX
- Download
llm-local-assistant-1.0.0.vsix from Latest Release
- In VS Code, run:
code --install-extension llm-local-assistant-1.0.0.vsix
- Or open Command Palette (
Ctrl+Shift+P) → "Extensions: Install from VSIX"
- Reload VS Code
Option C: Build from Source (Development)
- Install & Compile
npm install
npm run compile
# Or development watch mode:
npm run watch
- Launch in Debug Mode
- Press
F5 in VS Code to open debug window with extension loaded
Open VS Code Settings (Ctrl+,) and set:
{
"llm-assistant.endpoint": "http://localhost:11434",
"llm-assistant.model": "mistral",
"llm-assistant.temperature": 0.7,
"llm-assistant.maxTokens": 2048,
"llm-assistant.timeout": 30000
}
For custom ports:
{
"llm-assistant.endpoint": "http://127.0.0.1:9000"
}
Test Connection
Click LLM Assistant in status bar → Run "Test Connection" command
💡 Usage
Chat
Simply type messages and press Enter to chat with your LLM.
Available Commands
File Operations
/read <path> - Read and display file contents
/read src/main.ts
/write <path> [prompt] - Generate file content via LLM and write to disk
/write src/greeting.ts write a TypeScript function that greets users
If no prompt provided, uses: "Generate appropriate content for this file based on its name."
/suggestwrite <path> [prompt] - LLM suggests changes, you review and approve before writing
/suggestwrite src/config.ts add validation for the API endpoint
Git Integration
/git-commit-msg - Generate commit message from staged changes
/git-commit-msg
Reads all staged diffs, analyzes changes, and generates a conventional commit message following the pattern: <type>(<scope>): <description>
/git-review - AI-powered code review of staged changes
/git-review
Reviews all staged changes, identifies potential issues, suggests improvements, and provides specific feedback.
System
/help - Show available commands
/help
🏗️ Architecture & Design Decisions
Why This Architecture?
The extension uses a deliberately simple, regex-based command parser instead of a formal CLI framework. Here's why:
- User-Centric: Commands work anywhere in messages -
/read file.ts can appear mid-conversation
- Low Overhead: No dependency on heavyweight CLI libraries, keeping bundle size small
- Maintainability: Regex patterns are explicit and easy to audit in code review
- Extensibility: Easy to add new commands (e.g.,
/analyze, /refactor) without architecture changes
Trade-off: Less strict argument validation than formal parsers, but gained flexibility for natural interaction patterns.
Streaming vs Non-Streaming
The extension supports both streaming and non-streaming responses:
- Streaming (primary): Token-by-token display for real-time feedback
- Non-Streaming (fallback): For servers with streaming limitations (e.g., Ollama on non-standard ports)
Why this matters: Users get responsive, interactive feedback while typing long responses. The UI updates continuously instead of waiting for the full response.
In-Memory Conversation History
The LLMClient maintains conversation history per-session, not persisted:
private conversationHistory: Array<{ role: string; content: string }> = [];
Why:
- Simpler state management without database/file I/O
- Clear semantics: closing the chat panel resets history (expected behavior)
- Reduces complexity for MVP
- Future enhancement: optional persistence to disk/localStorage
Trade-off: Restarting VS Code or closing the chat panel loses context. This is intentional for simplicity; persistent history is a Phase 2 feature.
Async/Await + Try-Catch Error Handling
All user-triggered operations follow this pattern:
try {
const result = await llmClient.sendMessage(userInput);
// Display result
} catch (error) {
// Send user-friendly error message to chat
showError(`Error: ${error.message}`);
}
Why: Consistent error propagation, easy to debug, and all errors surface in the chat UI for users to see.
File I/O via VS Code Workspace API
All file operations use VS Code's URI-based workspace.fs API:
const uri = vscode.Uri.joinPath(workspaceFolder, relativePath);
await vscode.workspace.fs.writeFile(uri, encodedContent);
Why:
- Cross-platform path handling (Windows \ vs Unix /)
- Respects workspace folder boundaries
- Works with remote development (SSH, Codespaces)
- Triggers VS Code's file watching automatically
Production-Ready Features
Type Safety
- TypeScript strict mode enabled (
strict: true in tsconfig.json)
- All code passes type checking: 0 errors, 0 warnings
- Explicit types on public APIs
Error Handling
- Specific error detection for HTTP status codes (404 → model not found, 503 → server busy)
- Helpful error messages guide users to settings or configuration
- Timeout handling with AbortController for clean cancellation
Test Coverage
- 52 unit tests covering:
- LLMClient initialization, configuration, API contracts
- Command parsing (regex patterns for /read, /write, /suggestwrite)
- Error scenarios (connection failures, timeouts, invalid endpoints)
- File path validation and resolution
- Message formatting
- Run with:
npm test (100% pass rate)
Extensibility
Three clear extension points for Phase 2:
- New LLM Commands: Add regex pattern + handler in
extension.ts
- LLM Client Enhancements: Extend
LLMClient class with new capabilities
- Webview Features: Enhance UI in
webviewContent.ts
See ROADMAP.md for planned enhancements.
📦 Configuration Reference
| Setting |
Type |
Default |
Description |
llm-assistant.endpoint |
string |
http://localhost:11434 |
LLM server endpoint |
llm-assistant.model |
string |
mistral |
Model name |
llm-assistant.temperature |
number |
0.7 |
Response randomness (0-1, higher=creative) |
llm-assistant.maxTokens |
number |
2048 |
Max response length in tokens |
llm-assistant.timeout |
number |
30000 |
Request timeout in milliseconds |
🔧 Development
Build
npm run compile # Single build
npm run watch # Auto-rebuild on changes
npm run package # Production bundle
Testing
npm test # Run all tests
npm run test:coverage # Coverage report
npm run test:ui # Interactive test UI
Linting
npm run lint # ESLint validation
Debug
Press F5 in VS Code to launch extension in debug mode with breakpoints.
🗺️ Roadmap
See ROADMAP.md for planned features including:
- GitHub Copilot Agent Mode integration
- Persistent conversation history
- Custom system prompts
- Code-aware context injection
📚 Documentation
For advanced topics, see /docs/ folder.
🐛 Troubleshooting
"Cannot connect to endpoint"
- Verify LLM server is running and accessible
- Check endpoint URL in settings
- Test manually:
curl http://localhost:11434/api/tags
"Model not found"
- Verify model exists:
ollama list
- Download if needed:
ollama pull mistral
- Update
llm-assistant.model setting
"Request timeout"
- Increase
llm-assistant.timeout (default 30000ms)
- Try shorter prompts or smaller models
- Check server logs for errors
Slow responses?
- Reduce
maxTokens for shorter responses
- Try a smaller/faster model
- Ensure server has adequate resources
🔒 Privacy & Security
✅ 100% Local & Private
- Zero external API calls or cloud dependencies
- Your code and conversations never leave your machine
- Works completely offline after model is downloaded
- No telemetry or tracking
📄 License
MIT License - See LICENSE file for details
Local • Private • Offline-First AI Assistant for VS Code 🚀