VibeCoding – Local AI Coding Assistant for VS Code

Free. No token limits. Your AI, your choice.

VibeCoding brings Claude-like AI code assistance to VS Code without paying for tokens or hitting subscription limits. Choose your AI backend—Ollama, LM Studio, or Hugging Face Inference—and start coding with intelligence.

🎯 Features

Zero Cost: Local AI models, no API tokens, no per-request billing
Flexible Backends: Switch between Ollama, LM Studio, or Hugging Face Inference
Code Generation: AI writes functions, refactors, debugs, and explains code
File System Tools: AI can create folders, write files, and run terminal commands
Privacy-First: All conversations stay on your machine
Multi-Model Support: Use any open-source LLM (Llama 3.2, Mistral, Phi-3, TinyLlama, etc.)

⚡ Quick Start (5 Minutes)

1. Install One Backend

Choose ONE of these (all free, all local):

Ollama (Easiest for Mac/Linux)

Download: ollama.com
Run: ollama run llama3.2
That's it! VibeCoding auto-detects it.

LM Studio (Best for Windows/Mac)

Download: lmstudio.ai
Download a model, click "Start Server"
VibeCoding auto-connects to localhost:1234

Hugging Face Inference (Advanced)

See INSTALLATION.md for setup

2. Install VibeCoding

Open VS Code
Go to Extensions (Cmd+Shift+X / Ctrl+Shift+X)
Search for VibeCoding
Click Install

3. Start Chatting

Open the VibeCoding chat panel (search "Open VibeCoding Chat" in command palette)
Pick your backend in settings (if not auto-detected)
Ask it to code!

Example prompts:

"Write a function that sorts an array by date"
"Explain this code to me"
"Write unit tests for this function"
"Create a new folder structure for a React app"

🔧 Configuration

All settings are in Settings → VibeCoding:

Setting	Default	Options
Provider	Ollama	`ollama`, `lmStudio`, `hfAgent`
Ollama Model	llama3.2	Any model you've pulled
LM Studio URL	http://localhost:1234/v1	Your LM Studio server
HF Agent URL	http://localhost:8000	Your HF server

📚 Documentation

QUICKSTART.md — 30-second setup for impatient developers
INSTALLATION.md — Detailed backend setup, system requirements, troubleshooting
ONBOARDING.md — First-time user guide, best practices, examples
ARCHITECTURE.md — How VibeCoding works, extension structure, developer guide
PUBLISHING_GUIDE.md — How to publish similar extensions to VS Code Marketplace

💡 Why VibeCoding?

Problem: AI code assistance tools charge per token or limit free usage.

Solution: Run AI models locally on your hardware. No limits, no billing, full control.

Feature	VibeCoding	GitHub Copilot	Cursor
Cost	Free	$10–20/mo	$20/mo
Token Limits	Unlimited	50 req/mo (free)	Generous free tier
Model Choice	Any OSS model	GPT-4o	Claude
Privacy	Local (on your machine)	Sent to GitHub	Some local caching
Works Offline	✅ Yes	❌ No	❌ No

🚀 Supported Models

VibeCoding works with any open-source LLM. Popular choices:

Model	Size	Speed	Quality	Download
Llama 3.2	3B–70B	⚡⚡⚡	⭐⭐⭐	Ollama: `ollama run llama3.2`
Mistral	7B–72B	⚡⚡	⭐⭐⭐⭐	LM Studio / Ollama
Phi-3	3.8B–14B	⚡⚡⚡	⭐⭐⭐	LM Studio / Ollama
TinyLlama	1.1B	⚡⚡⚡⚡	⭐⭐	Ollama: `ollama run tinyllama`

Recommendation for First Time:

Mac M1/M2/M3: Start with ollama run llama3.2 (fast, accurate, auto-optimizes for GPU)
Windows/Linux GPU: LM Studio + Mistral (best balance of speed and quality)
Limited RAM (<8GB): TinyLlama (small, still surprisingly smart)

🐛 Troubleshooting

"Provider not available" error

Is your backend running? Check:
- Ollama: Is it started? (ollama serve in terminal)
- LM Studio: Did you click "Start Server"?
- HF Agent: Did you start the Python server? (python scripts/hf_server.py)
Is it on the right port?
- Ollama: http://localhost:11434
- LM Studio: http://localhost:1234/v1
- HF Agent: http://localhost:8000
Check VS Code settings: VibeCoding → Provider → select correct backend

"No model loaded" error

Ollama: Pull a model first: ollama run llama3.2
LM Studio: Download a model inside the app, click "Start Server"
HF Agent: Specify a model: python scripts/hf_server.py --model TinyLlama/TinyLlama-1.1B-Chat-v1.0

AI responses are slow

Try a smaller model (TinyLlama instead of 70B Llama)
Check if CPU is maxed out (GPU not being used)
Increase max_tokens in chat settings for longer responses

Terminal commands fail

The AI has safety restrictions to prevent dangerous commands
Blocked: rm -rf /, sudo, format, etc.
Ask it to explain the command instead, then run it yourself

📖 Resources

VS Code Extension API: https://code.visualstudio.com/api
Ollama Docs: https://ollama.ai
LM Studio: https://lmstudio.ai
Hugging Face Models: https://huggingface.co/models?pipeline_tag=text-generation
Official Website: https://vivecode.dev

📝 License

MIT License – Use freely, modify, distribute.

🤝 Contributing

Found a bug? Want to add a feature? Open an issue or PR on GitHub.

Ready to code with local AI?

Install a backend (Ollama recommended for Mac)
Install VibeCoding from the VS Code Marketplace
Open chat and start building

No tokens. No limits. Just AI.

VibeCoding

ViveCode Labs