Vajra — AI Coding Assistant
The free Cursor alternative built for VS Code
Streaming chat · Ghost-text autocomplete · @ context · Git AI · Code actions · 40+ models
What is Vajra?
Vajra is a full-featured AI coding assistant that runs inside VS Code. It gives you a streaming chat panel, inline ghost-text completions (Tab to accept), one-click code fixes, AI-powered git tools, and deep context awareness — all from 40+ models including Claude 4, GPT-4.1, Gemini 2.5, DeepSeek R1, and local Ollama models that never leave your machine.
Features
💬 Streaming AI Chat
Real-time token-by-token responses in a persistent chat panel. Ask questions, paste code, share screenshots — Vajra streams the answer as it's generated, no waiting for a wall of text.
- Multi-turn conversation with full history context
- Paste images directly into chat (vision models)
- Syntax-highlighted code blocks with one-click copy
- Markdown rendering with tables, lists, and headers
⚡ Ghost-Text Autocomplete
Start typing and Vajra suggests what comes next — right inside your editor, just like Copilot. Press Tab to accept, keep typing to ignore.
- Triggers automatically after a configurable delay (default 650ms)
- 80-entry LRU cache to avoid redundant API calls
- Skips comments and short lines to stay out of your way
- Toggle on/off instantly from the status bar (
⚡ Vajra)
@ Context Mentions
Type @ in chat to inject live context from your project:
| Mention |
What it includes |
@currentfile |
The full file you have open |
@selection |
Whatever text you have selected |
@file:name |
Any file in your workspace by name |
@codebase:query |
Semantic search across your entire project |
@gitdiff |
Current staged git diff |
@web:query |
Live web search results via DuckDuckGo |
See a red squiggle? Click the lightbulb (or press Ctrl+.) and get:
- ⚡ Fix with Vajra — streams a fix directly into your editor with Apply / Show Diff / Cancel controls
- 💡 Explain with Vajra — opens chat with a full explanation
- 🔧 Refactor with Vajra — suggests cleaner alternatives
✏️ Inline Edit (Cmd+K / Ctrl+K)
Select any code, press Ctrl+K (or Cmd+K on Mac), type your instruction, and Vajra streams the rewritten version. You see a diff and choose Apply or Cancel before anything changes.
Generate Commit Message (Ctrl+Shift+G / Cmd+Shift+G):
Reads your staged diff and writes a conventional commit message — imperative mood, ≤72 char subject line — copied to clipboard and ready to paste.
Generate PR Description:
Reads your commit log and diff, writes a full markdown PR description, and opens it in a new editor tab.
Both appear as buttons in the Source Control panel (SCM sidebar).
📋 Slash Commands
Type / in chat to access:
| Command |
Action |
/explain |
Explain selected code |
/refactor |
Suggest refactoring |
/tests |
Generate unit tests |
/debug |
Debug and find bugs |
/optimize |
Performance improvements |
/comments |
Add code comments |
/council |
Multi-agent debate mode |
/terminal |
Paste and analyze terminal output |
🤖 40+ Models, 10+ Providers
| Provider |
Flagship Models |
Notes |
| Anthropic |
Claude Opus 4, Sonnet 4, Haiku 4.5 |
Best reasoning |
| OpenAI |
GPT-4.1, o3, o4-mini |
Best ecosystem |
| Google |
Gemini 2.5 Pro, Flash |
Best multimodal |
| DeepSeek |
R1, V3 |
Best open-source |
| Groq |
Llama 4, Qwen2.5-Coder |
Fastest inference |
| Mistral |
Devstral, Codestral |
Best for code |
| Ollama |
Any local model |
100% private, free |
| OpenRouter |
200+ models, one key |
Maximum choice |
| Qwen |
Qwen2.5-Coder 32B/14B/7B |
Best local coding |
| HuggingFace |
Open-source models |
Research models |
🏠 .vajrarules — Project AI Instructions
Create a .vajrarules file (or .cursorrules) in your project root. Vajra reads it automatically and uses it as the system prompt for every request in that project. A green dot appears in the chat header when rules are active.
Use it to set coding style, tech stack, naming conventions, or anything else you want the AI to always know about your project.
Quick Start
Option A — Ollama (Free, 100% Private)
# 1. Install Ollama from https://ollama.ai/download
# 2. Pull a coding model
ollama pull qwen2.5-coder:7b # 4.1GB — best for most machines
# 3. Open VS Code — Vajra auto-detects Ollama and configures itself
That's it. No API key, no internet, no cost.
Option B — Cloud Provider
- Press
Ctrl+Shift+P → Vajra: Select AI Provider
- Pick your provider (OpenAI, Anthropic, Gemini, etc.)
- Enter your API key when prompted
- Start chatting
Keyboard Shortcuts
| Shortcut |
Action |
Ctrl+Shift+V / Cmd+Shift+V |
Open Vajra Chat |
Ctrl+K / Cmd+K |
Inline Edit (with selection) |
Ctrl+Shift+G / Cmd+Shift+G |
Generate Commit Message |
Tab |
Accept autocomplete suggestion |
Click ⚡ Vajra status bar |
Toggle autocomplete on/off |
Select any code in your editor and right-click to access:
- Explain Code
- Refactor Code
- Debug Code
- Optimize Code
- Add Comments
- Generate Unit Tests
- Inline Edit
Configuration
Open Settings (Ctrl+,) and search vajra.
| Setting |
Default |
Description |
vajra.defaultProvider |
ollama |
Active AI provider |
vajra.defaultModel |
qwen2.5-coder:7b |
Active model |
vajra.enableAutocomplete |
true |
Ghost-text completions |
vajra.autocompleteDelay |
650 |
Ms before autocomplete fires |
vajra.temperature |
0.7 |
Creativity (0 = precise, 2 = creative) |
vajra.maxTokens |
4096 |
Max response length |
vajra.enableMultiModalInput |
true |
Image paste in chat |
Tip: API keys are stored in User Settings (Global) — never committed to your repo.
Hardware Guide (Ollama)
| RAM |
Recommended Model |
Notes |
| 4 GB |
qwen2.5-coder:1.5b |
Lightweight, fast |
| 8 GB |
qwen2.5-coder:7b ⭐ |
Best all-around |
| 16 GB |
qwen2.5-coder:14b |
Near-cloud quality |
| 32 GB+ |
qwen2.5-coder:32b |
Maximum performance |
Troubleshooting
Autocomplete not showing?
Check the ⚡ Vajra pill in the status bar — click it to toggle on. Also check vajra.enableAutocomplete in settings.
Ollama not detected?
Run ollama serve in a terminal, then reload VS Code.
Wrong model/provider error?
Press Ctrl+Shift+P → Vajra: Select AI Provider to re-sync the provider and model.
Slow responses?
Switch to Groq for cloud (fastest inference) or use a smaller Ollama model locally.
Support