Vllama - Local LLM Integration for VS Code
Integrate your locally-hosted LLM models directly into VS Code's native Chat interface. Chat with your local models using the @vllama participant, just like you would with GitHub Copilot or other AI assistants.
Features
- 🤖 Local LLM Integration: Connect to your locally-hosted LLM server (e.g., Qwen Coder, LLaMA, etc.)
- 💬 Native Chat Interface: Use
@vllama in VS Code's Chat view for seamless interaction
- 🔌 Simple Setup: Just point to your local server running on localhost:2513
- 📝 Conversation History: Maintains context across multiple turns for better responses
- ⚡ Real-time Responses: Streams responses from your local model
- 🛡️ Error Handling: Helpful error messages when connection issues occur
Prerequisites
Before using this extension, you need:
- VS Code version 1.106.1 or higher
- A local LLM server running on
http://localhost:2513 with a /chat endpoint
Local LLM Server Requirements
Your local server must:
- Accept POST requests to
/chat
- Expect JSON payload:
{"message": "user prompt"}
- Return JSON response:
{"response": "llm response"}
Example using Python Flask:
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/chat', methods=['POST'])
def chat():
data = request.json
user_message = data.get('message', '')
# Your LLM inference code here
llm_response = your_llm_model.generate(user_message)
return jsonify({"response": llm_response})
if __name__ == '__main__':
app.run(host='127.0.0.1', port=2513)
Installation
From Source
- Clone this repository
- Open in VS Code
- Install dependencies:
npm install
- Compile the extension:
npm run compile
- Press
F5 to launch the Extension Development Host
From VS Code Marketplace
- Open VS Code
- Go to Extensions (
Ctrl+Shift+X or Cmd+Shift+X)
- Search for "Vllama"
- Click Install
From VSIX File
Download the latest .vsix file from the releases page and install it:
code --install-extension vllama-0.0.2.vsix
Usage
Step 1: Start Your Local LLM Server
Ensure your local LLM server is running on http://localhost:2513:
# Example: Start your LLM server
python your_llm_server.py
Step 2: Open VS Code Chat
- Press
Ctrl+Shift+I (Windows/Linux) or Cmd+Shift+I (Mac)
- Or click the Chat icon in the Activity Bar
- Or use Command Palette:
Chat: Open Chat
Step 3: Chat with @vllama
Type @vllama followed by your question:
@vllama How do I create a Python function to calculate fibonacci numbers?
Example Conversations
Code Generation:
@vllama Write a TypeScript function to sort an array of objects by a specific property
Code Explanation:
@vllama Explain what this regex does: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
Debugging Help:
@vllama Why am I getting a "Cannot read property of undefined" error in JavaScript?
Configuration
Currently, the extension is configured to connect to http://localhost:2513. Future versions will allow customization through VS Code settings.
Architecture
The extension consists of three main components:
Language Model Chat Provider (localLLMProvider.ts)
- Registers your local LLM as a language model in VS Code
- Handles communication with the local server
- Provides token counting and model information
Chat Participant (chatParticipant.ts)
- Implements the
@vllama chat interface
- Manages conversation history
- Provides follow-up suggestions
Extension Entry Point (extension.ts)
- Activates the extension
- Registers both the provider and participant
- Handles lifecycle management
Troubleshooting
"Cannot connect to local LLM server"
Problem: The extension can't reach your local server.
Solutions:
- Verify your LLM server is running:
curl http://localhost:2513/chat
- Check the server is listening on port 2513
- Ensure no firewall is blocking the connection
- Check server logs for errors
"Request timed out"
Problem: The server is taking too long to respond.
Solutions:
- Your model might be processing a complex request
- Try a simpler prompt first
- Check if your server has sufficient resources (CPU/GPU/RAM)
- Consider using a smaller model for faster responses
"Server returned error: 500"
Problem: Your server encountered an internal error.
Solutions:
- Check your server logs for detailed error messages
- Verify your model is loaded correctly
- Ensure the request format matches what your server expects
Development
Project Structure
vllama/
├── src/
│ ├── extension.ts # Main extension entry point
│ ├── localLLMProvider.ts # Language Model Provider implementation
│ └── chatParticipant.ts # Chat Participant implementation
├── package.json # Extension manifest
├── tsconfig.json # TypeScript configuration
└── README.md # This file
Building
# Install dependencies
npm install
# Compile TypeScript
npm run compile
# Watch mode (auto-compile on changes)
npm run watch
# Run linter
npm run lint
Testing
Press F5 in VS Code to launch the Extension Development Host with your extension loaded.
Future Enhancements
- [ ] Configurable host and port through VS Code settings
- [ ] Support for multiple local models
- [ ] Model selection UI
- [ ] Streaming responses for real-time feedback
- [ ] Agentic tools integration
- [ ] Code context awareness
- [ ] File and workspace integration
- [ ] Custom slash commands
- [ ] Conversation export/import
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Copyright 2025 Vllama Contributors
Acknowledgments
Built with the VS Code Extension API and inspired by the need for privacy-focused, local AI development tools.
Enjoy chatting with your local LLM! 🚀