Vllama - Local LLM Integration for VS Code

Integrate your locally-hosted LLM models directly into VS Code's native Chat interface. Chat with your local models using the @vllama participant, just like you would with GitHub Copilot or other AI assistants.

Features

🤖 Local LLM Integration: Connect to your locally-hosted LLM server (e.g., Qwen Coder, LLaMA, etc.)
💬 Native Chat Interface: Use @vllama in VS Code's Chat view for seamless interaction
🔌 Simple Setup: Just point to your local server running on localhost:2513
📝 Conversation History: Maintains context across multiple turns for better responses
⚡ Real-time Responses: Streams responses from your local model
🛡️ Error Handling: Helpful error messages when connection issues occur

Prerequisites

Before using this extension, you need:

VS Code version 1.106.1 or higher
A local LLM server running on http://localhost:2513 with a /chat endpoint

Local LLM Server Requirements

Your local server must:

Accept POST requests to /chat
Expect JSON payload: {"message": "user prompt"}
Return JSON response: {"response": "llm response"}

Example using Python Flask:

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    user_message = data.get('message', '')
    
    # Your LLM inference code here
    llm_response = your_llm_model.generate(user_message)
    
    return jsonify({"response": llm_response})

if __name__ == '__main__':
    app.run(host='127.0.0.1', port=2513)

Installation

From Source

Clone this repository
Open in VS Code
Install dependencies:
```
npm install
```
Compile the extension:
```
npm run compile
```
Press F5 to launch the Extension Development Host

From VS Code Marketplace

Open VS Code
Go to Extensions (Ctrl+Shift+X or Cmd+Shift+X)
Search for "Vllama"
Click Install

From VSIX File

Download the latest .vsix file from the releases page and install it:

code --install-extension vllama-0.0.2.vsix

Usage

Step 1: Start Your Local LLM Server

Ensure your local LLM server is running on http://localhost:2513:

# Example: Start your LLM server
python your_llm_server.py

Step 2: Open VS Code Chat

Press Ctrl+Shift+I (Windows/Linux) or Cmd+Shift+I (Mac)
Or click the Chat icon in the Activity Bar
Or use Command Palette: Chat: Open Chat

Step 3: Chat with @vllama

Type @vllama followed by your question:

@vllama How do I create a Python function to calculate fibonacci numbers?

Example Conversations

Code Generation:

@vllama Write a TypeScript function to sort an array of objects by a specific property

Code Explanation:

@vllama Explain what this regex does: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/

Debugging Help:

@vllama Why am I getting a "Cannot read property of undefined" error in JavaScript?

Configuration

Currently, the extension is configured to connect to http://localhost:2513. Future versions will allow customization through VS Code settings.

Architecture

The extension consists of three main components:

Language Model Chat Provider (localLLMProvider.ts)
- Registers your local LLM as a language model in VS Code
- Handles communication with the local server
- Provides token counting and model information
Chat Participant (chatParticipant.ts)
- Implements the @vllama chat interface
- Manages conversation history
- Provides follow-up suggestions
Extension Entry Point (extension.ts)
- Activates the extension
- Registers both the provider and participant
- Handles lifecycle management

Troubleshooting

"Cannot connect to local LLM server"

Problem: The extension can't reach your local server.

Solutions:

Verify your LLM server is running: curl http://localhost:2513/chat
Check the server is listening on port 2513
Ensure no firewall is blocking the connection
Check server logs for errors

"Request timed out"

Problem: The server is taking too long to respond.

Solutions:

Your model might be processing a complex request
Try a simpler prompt first
Check if your server has sufficient resources (CPU/GPU/RAM)
Consider using a smaller model for faster responses

"Server returned error: 500"

Problem: Your server encountered an internal error.

Solutions:

Check your server logs for detailed error messages
Verify your model is loaded correctly
Ensure the request format matches what your server expects

Development

Project Structure

vllama/
├── src/
│   ├── extension.ts          # Main extension entry point
│   ├── localLLMProvider.ts   # Language Model Provider implementation
│   └── chatParticipant.ts    # Chat Participant implementation
├── package.json              # Extension manifest
├── tsconfig.json            # TypeScript configuration
└── README.md                # This file

Building

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Watch mode (auto-compile on changes)
npm run watch

# Run linter
npm run lint

Testing

Press F5 in VS Code to launch the Extension Development Host with your extension loaded.

Future Enhancements

[ ] Configurable host and port through VS Code settings
[ ] Support for multiple local models
[ ] Model selection UI
[ ] Streaming responses for real-time feedback
[ ] Agentic tools integration
[ ] Code context awareness
[ ] File and workspace integration
[ ] Custom slash commands
[ ] Conversation export/import

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

Built with the VS Code Extension API and inspired by the need for privacy-focused, local AI development tools.

Enjoy chatting with your local LLM! 🚀

Vllama - Local LLM Chat

Manvith Gopu

Vllama - Local LLM Integration for VS Code

Features

Prerequisites

Local LLM Server Requirements

Installation

From Source

From VS Code Marketplace

From VSIX File

Usage

Step 1: Start Your Local LLM Server

Step 2: Open VS Code Chat

Step 3: Chat with @vllama

Example Conversations

Configuration

Architecture

Troubleshooting

"Cannot connect to local LLM server"

"Request timed out"

"Server returned error: 500"

Development

Project Structure

Building

Testing

Future Enhancements

Contributing

License

Acknowledgments