Skip to content
| Marketplace
Sign in
Visual Studio Code>Machine Learning>Vajra - AI Coding AssistantNew to Visual Studio Code? Get it now.
Vajra - AI Coding Assistant

Vajra - AI Coding Assistant

Ashish Sharda

|
210 installs
| (0) | Free
Free Cursor alternative — streaming AI chat, ghost-text autocomplete, @ context, git tools, multi-agent council. Claude 4, GPT-4.1, Gemini 2.5, Llama 4, DeepSeek R1 & 40+ models.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Vajra — AI Coding Assistant

The free Cursor alternative built for VS Code

Streaming chat · Ghost-text autocomplete · @ context · Git AI · Code actions · 40+ models


What is Vajra?

Vajra is a full-featured AI coding assistant that runs inside VS Code. It gives you a streaming chat panel, inline ghost-text completions (Tab to accept), one-click code fixes, AI-powered git tools, and deep context awareness — all from 40+ models including Claude 4, GPT-4.1, Gemini 2.5, DeepSeek R1, and local Ollama models that never leave your machine.


Features

💬 Streaming AI Chat

Real-time token-by-token responses in a persistent chat panel. Ask questions, paste code, share screenshots — Vajra streams the answer as it's generated, no waiting for a wall of text.

  • Multi-turn conversation with full history context
  • Paste images directly into chat (vision models)
  • Syntax-highlighted code blocks with one-click copy
  • Markdown rendering with tables, lists, and headers

⚡ Ghost-Text Autocomplete

Start typing and Vajra suggests what comes next — right inside your editor, just like Copilot. Press Tab to accept, keep typing to ignore.

  • Triggers automatically after a configurable delay (default 650ms)
  • 80-entry LRU cache to avoid redundant API calls
  • Skips comments and short lines to stay out of your way
  • Toggle on/off instantly from the status bar (⚡ Vajra)

@ Context Mentions

Type @ in chat to inject live context from your project:

Mention What it includes
@currentfile The full file you have open
@selection Whatever text you have selected
@file:name Any file in your workspace by name
@codebase:query Semantic search across your entire project
@gitdiff Current staged git diff
@web:query Live web search results via DuckDuckGo

🔧 Code Actions (Lightbulb Menu)

See a red squiggle? Click the lightbulb (or press Ctrl+.) and get:

  • ⚡ Fix with Vajra — streams a fix directly into your editor with Apply / Show Diff / Cancel controls
  • 💡 Explain with Vajra — opens chat with a full explanation
  • 🔧 Refactor with Vajra — suggests cleaner alternatives

✏️ Inline Edit (Cmd+K / Ctrl+K)

Select any code, press Ctrl+K (or Cmd+K on Mac), type your instruction, and Vajra streams the rewritten version. You see a diff and choose Apply or Cancel before anything changes.

🔀 Git AI Tools

Generate Commit Message (Ctrl+Shift+G / Cmd+Shift+G): Reads your staged diff and writes a conventional commit message — imperative mood, ≤72 char subject line — copied to clipboard and ready to paste.

Generate PR Description: Reads your commit log and diff, writes a full markdown PR description, and opens it in a new editor tab.

Both appear as buttons in the Source Control panel (SCM sidebar).

📋 Slash Commands

Type / in chat to access:

Command Action
/explain Explain selected code
/refactor Suggest refactoring
/tests Generate unit tests
/debug Debug and find bugs
/optimize Performance improvements
/comments Add code comments
/council Multi-agent debate mode
/terminal Paste and analyze terminal output

🤖 40+ Models, 10+ Providers

Provider Flagship Models Notes
Anthropic Claude Opus 4, Sonnet 4, Haiku 4.5 Best reasoning
OpenAI GPT-4.1, o3, o4-mini Best ecosystem
Google Gemini 2.5 Pro, Flash Best multimodal
DeepSeek R1, V3 Best open-source
Groq Llama 4, Qwen2.5-Coder Fastest inference
Mistral Devstral, Codestral Best for code
Ollama Any local model 100% private, free
OpenRouter 200+ models, one key Maximum choice
Qwen Qwen2.5-Coder 32B/14B/7B Best local coding
HuggingFace Open-source models Research models

🏠 .vajrarules — Project AI Instructions

Create a .vajrarules file (or .cursorrules) in your project root. Vajra reads it automatically and uses it as the system prompt for every request in that project. A green dot appears in the chat header when rules are active.

Use it to set coding style, tech stack, naming conventions, or anything else you want the AI to always know about your project.


Quick Start

Option A — Ollama (Free, 100% Private)

# 1. Install Ollama from https://ollama.ai/download

# 2. Pull a coding model
ollama pull qwen2.5-coder:7b   # 4.1GB — best for most machines

# 3. Open VS Code — Vajra auto-detects Ollama and configures itself

That's it. No API key, no internet, no cost.

Option B — Cloud Provider

  1. Press Ctrl+Shift+P → Vajra: Select AI Provider
  2. Pick your provider (OpenAI, Anthropic, Gemini, etc.)
  3. Enter your API key when prompted
  4. Start chatting

Keyboard Shortcuts

Shortcut Action
Ctrl+Shift+V / Cmd+Shift+V Open Vajra Chat
Ctrl+K / Cmd+K Inline Edit (with selection)
Ctrl+Shift+G / Cmd+Shift+G Generate Commit Message
Tab Accept autocomplete suggestion
Click ⚡ Vajra status bar Toggle autocomplete on/off

Right-Click Menu

Select any code in your editor and right-click to access:

  • Explain Code
  • Refactor Code
  • Debug Code
  • Optimize Code
  • Add Comments
  • Generate Unit Tests
  • Inline Edit

Configuration

Open Settings (Ctrl+,) and search vajra.

Setting Default Description
vajra.defaultProvider ollama Active AI provider
vajra.defaultModel qwen2.5-coder:7b Active model
vajra.enableAutocomplete true Ghost-text completions
vajra.autocompleteDelay 650 Ms before autocomplete fires
vajra.temperature 0.7 Creativity (0 = precise, 2 = creative)
vajra.maxTokens 4096 Max response length
vajra.enableMultiModalInput true Image paste in chat

Tip: API keys are stored in User Settings (Global) — never committed to your repo.


Hardware Guide (Ollama)

RAM Recommended Model Notes
4 GB qwen2.5-coder:1.5b Lightweight, fast
8 GB qwen2.5-coder:7b ⭐ Best all-around
16 GB qwen2.5-coder:14b Near-cloud quality
32 GB+ qwen2.5-coder:32b Maximum performance

Troubleshooting

Autocomplete not showing? Check the ⚡ Vajra pill in the status bar — click it to toggle on. Also check vajra.enableAutocomplete in settings.

Ollama not detected? Run ollama serve in a terminal, then reload VS Code.

Wrong model/provider error? Press Ctrl+Shift+P → Vajra: Select AI Provider to re-sync the provider and model.

Slow responses? Switch to Groq for cloud (fastest inference) or use a smaller Ollama model locally.


Support

  • 📧 Email: ashishjsharda@gmail.com
  • ⭐ Leave a review on the Marketplace — it helps more developers find Vajra

Made with ❤️ for developers who ship

VS Code Marketplace · Email Support

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft