Skip to content
| Marketplace
Sign in
Visual Studio Code>Chat>RAGKnightNew to Visual Studio Code? Get it now.
RAGKnight

RAGKnight

StrangeSp10

| (0) | Free
Self-contained semantic search and RAG for your codebase — no external dependencies
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

RAGKnight

Self-contained semantic search and RAG for your codebase — no external dependencies.

RAGKnight brings local Retrieval-Augmented Generation (RAG) directly into VS Code. Index your workspace, search with semantic, hybrid, or keyword matching, and ask questions about your code — all powered by local models that run on your machine.

Features

🔍 Three Search Modes

  • Semantic — Vector similarity search using sentence-transformers
  • Hybrid — Combined vector + BM25 keyword search (default)
  • BM25 — Pure keyword search, no embeddings needed

💬 Chat Participant (@rag)

Use @rag in Copilot Chat with built-in commands:

Command Description
/search Search across your indexed codebase
/ask Ask a question — retrieves context and generates an answer
/index Index the current workspace
/index-common Index a directory into the shared common knowledge base
/status Show index status
/learn Record a learning to cumulative knowledge
/knowledge View all cumulative knowledge entries
/clear Clear the workspace index

🧠 Agentic Query Planning

Complex questions are automatically decomposed into sub-queries, executed in parallel, and merged for comprehensive answers. Uses Copilot for intelligent planning with a heuristic fallback.

🔌 Pluggable Backends

  • Generation: Copilot LM API or local Ollama (auto-fallback)
  • Embeddings: Local sentence-transformers (all-MiniLM-L6-v2) or VS Code LM API

📦 Self-Contained

Everything installs automatically on first launch:

  • Python backend with sentence-transformers
  • Ollama for local LLM generation
  • No API keys required — works fully offline

📚 Cumulative Knowledge

Record learnings that persist across sessions and get injected into every RAG prompt automatically — the system gets smarter the more you use it.

Getting Started

  1. Install the extension
  2. Open a workspace and run RAGKnight: Setup from the Command Palette (or accept the automatic setup prompt)
  3. Run RAGKnight: Index Workspace to index your code
  4. Use @rag /search your query or @rag /ask your question in Copilot Chat

Commands

Command Description
RAGKnight: Setup One-time setup — installs Python backend and Ollama
RAGKnight: Search Codebase Search your indexed code
RAGKnight: Ask Question Ask a question with RAG-powered answers
RAGKnight: Index Workspace Index the current workspace
RAGKnight: Index Directory to Common Add a directory to shared knowledge
RAGKnight: Show Status View index statistics
RAGKnight: Pull Ollama Model Download an Ollama model
RAGKnight: Change LLM Model Switch the Ollama generation model
RAGKnight: Change Embedding Model Switch the sentence-transformers model
RAGKnight: Select Copilot Model Pick from available Copilot models
RAGKnight: Record Learning Add to cumulative knowledge
RAGKnight: Show Knowledge View cumulative knowledge
RAGKnight: Clear Workspace Index Clear workspace index
RAGKnight: Clear Common Index Clear shared common index

Settings

Setting Default Description
ragknight.searchMode hybrid Search mode: semantic, hybrid, or bm25
ragknight.scope all Search scope: all, workspace, or common
ragknight.topK 10 Number of search results
ragknight.generationBackend ollama LLM backend: auto, copilot, or ollama
ragknight.embeddingBackend local Embedding backend: auto, copilot, or local
ragknight.ollamaModel llama3.2 Ollama model for generation
ragknight.embeddingModel all-MiniLM-L6-v2 Sentence-transformers model
ragknight.agenticMode true Enable agentic query decomposition
ragknight.chunkSize 512 Characters per text chunk when indexing

Requirements

  • VS Code 1.93+
  • Python 3.10+ must be available on your system PATH
  • GitHub Copilot (optional) — enables Copilot LM API for generation and embeddings

Architecture

RAGKnight uses a local Python backend with:

  • LanceDB for vector storage (embedded, zero-config)
  • sentence-transformers for embeddings (all-MiniLM-L6-v2, ~80MB, downloads on first use)
  • Ollama for local LLM generation (bundled, auto-managed)
  • BM25 for keyword search (pure Python, no dependencies)

Per-workspace indexes ensure each project has its own search space, while a shared common index lets you add reference material accessible from anywhere.

License

MIT

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft