Skip to content
| Marketplace
Sign in
Visual Studio Code>Machine Learning>Vllama - Local LLM ChatNew to Visual Studio Code? Get it now.
Vllama - Local LLM Chat

Vllama - Local LLM Chat

Manvith Gopu

|
6 installs
| (0) | Free
Chat with your locally-hosted LLM models (Qwen, LLaMA, etc.) directly in VS Code using @vllama. Free, private, and runs entirely on your machine.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Vllama - Local LLM Integration for VS Code

Integrate your locally-hosted LLM models directly into VS Code's native Chat interface. Chat with your local models using the @vllama participant, just like you would with GitHub Copilot or other AI assistants.

Features

  • 🤖 Local LLM Integration: Connect to your locally-hosted LLM server (e.g., Qwen Coder, LLaMA, etc.)
  • 💬 Native Chat Interface: Use @vllama in VS Code's Chat view for seamless interaction
  • 🔌 Simple Setup: Just point to your local server running on localhost:2513
  • 📝 Conversation History: Maintains context across multiple turns for better responses
  • ⚡ Real-time Responses: Streams responses from your local model
  • 🛡️ Error Handling: Helpful error messages when connection issues occur

Prerequisites

Before using this extension, you need:

  1. VS Code version 1.106.1 or higher
  2. A local LLM server running on http://localhost:2513 with a /chat endpoint

Local LLM Server Requirements

Your local server must:

  • Accept POST requests to /chat
  • Expect JSON payload: {"message": "user prompt"}
  • Return JSON response: {"response": "llm response"}

Example using Python Flask:

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chat', methods=['POST'])
def chat():
    data = request.json
    user_message = data.get('message', '')
    
    # Your LLM inference code here
    llm_response = your_llm_model.generate(user_message)
    
    return jsonify({"response": llm_response})

if __name__ == '__main__':
    app.run(host='127.0.0.1', port=2513)

Installation

From Source

  1. Clone this repository
  2. Open in VS Code
  3. Install dependencies:
    npm install
    
  4. Compile the extension:
    npm run compile
    
  5. Press F5 to launch the Extension Development Host

From VS Code Marketplace

  1. Open VS Code
  2. Go to Extensions (Ctrl+Shift+X or Cmd+Shift+X)
  3. Search for "Vllama"
  4. Click Install

From VSIX File

Download the latest .vsix file from the releases page and install it:

code --install-extension vllama-0.0.2.vsix

Usage

Step 1: Start Your Local LLM Server

Ensure your local LLM server is running on http://localhost:2513:

# Example: Start your LLM server
python your_llm_server.py

Step 2: Open VS Code Chat

  • Press Ctrl+Shift+I (Windows/Linux) or Cmd+Shift+I (Mac)
  • Or click the Chat icon in the Activity Bar
  • Or use Command Palette: Chat: Open Chat

Step 3: Chat with @vllama

Type @vllama followed by your question:

@vllama How do I create a Python function to calculate fibonacci numbers?

Example Conversations

Code Generation:

@vllama Write a TypeScript function to sort an array of objects by a specific property

Code Explanation:

@vllama Explain what this regex does: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/

Debugging Help:

@vllama Why am I getting a "Cannot read property of undefined" error in JavaScript?

Configuration

Currently, the extension is configured to connect to http://localhost:2513. Future versions will allow customization through VS Code settings.

Architecture

The extension consists of three main components:

  1. Language Model Chat Provider (localLLMProvider.ts)

    • Registers your local LLM as a language model in VS Code
    • Handles communication with the local server
    • Provides token counting and model information
  2. Chat Participant (chatParticipant.ts)

    • Implements the @vllama chat interface
    • Manages conversation history
    • Provides follow-up suggestions
  3. Extension Entry Point (extension.ts)

    • Activates the extension
    • Registers both the provider and participant
    • Handles lifecycle management

Troubleshooting

"Cannot connect to local LLM server"

Problem: The extension can't reach your local server.

Solutions:

  • Verify your LLM server is running: curl http://localhost:2513/chat
  • Check the server is listening on port 2513
  • Ensure no firewall is blocking the connection
  • Check server logs for errors

"Request timed out"

Problem: The server is taking too long to respond.

Solutions:

  • Your model might be processing a complex request
  • Try a simpler prompt first
  • Check if your server has sufficient resources (CPU/GPU/RAM)
  • Consider using a smaller model for faster responses

"Server returned error: 500"

Problem: Your server encountered an internal error.

Solutions:

  • Check your server logs for detailed error messages
  • Verify your model is loaded correctly
  • Ensure the request format matches what your server expects

Development

Project Structure

vllama/
├── src/
│   ├── extension.ts          # Main extension entry point
│   ├── localLLMProvider.ts   # Language Model Provider implementation
│   └── chatParticipant.ts    # Chat Participant implementation
├── package.json              # Extension manifest
├── tsconfig.json            # TypeScript configuration
└── README.md                # This file

Building

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Watch mode (auto-compile on changes)
npm run watch

# Run linter
npm run lint

Testing

Press F5 in VS Code to launch the Extension Development Host with your extension loaded.

Future Enhancements

  • [ ] Configurable host and port through VS Code settings
  • [ ] Support for multiple local models
  • [ ] Model selection UI
  • [ ] Streaming responses for real-time feedback
  • [ ] Agentic tools integration
  • [ ] Code context awareness
  • [ ] File and workspace integration
  • [ ] Custom slash commands
  • [ ] Conversation export/import

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Copyright 2025 Vllama Contributors

Acknowledgments

Built with the VS Code Extension API and inspired by the need for privacy-focused, local AI development tools.


Enjoy chatting with your local LLM! 🚀

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2025 Microsoft