Knowledge RAG VS Code Extension

Knowledge RAG turns any folder of documents into a searchable knowledge base directly inside VS Code. It ships with an embedded sentence-transformers model (all-MiniLM-L6-v2), ingests multi-format files, builds embeddings on your machine, and lets you query the corpus through a rich results panel or GitHub Copilot Chat without sending content to external services.

Highlights

Local RAG pipeline - Analyze and index Markdown, text, Office, PDF, JSON/YAML/XML, and image files (via OCR) into <your-folder>/.knowledge-rag.
Hands-free Python runtime - The extension bootstraps a virtual environment inside extension/.venv, installs the requirements listed in python/requirements.txt, and exposes a knowledgeRag.pythonPath setting for custom interpreters.
GitHub Copilot integration - Send the top search hits to Copilot Chat, run the Knowledge RAG: Query Knowledge Base with GitHub Copilot command, or mention @knowledge-rag directly inside Copilot conversations.
Comfortable UX - Dedicated output channel, Explorer context-menu entry, Command Palette commands, and keybindings (Ctrl+K Q / Ctrl+K Shift+Q) keep the workflow discoverable.

Requirements

Visual Studio Code 1.88.0 or newer.
Python 3.8+ available on your PATH (3.10+ recommended for faster inference). Configure knowledgeRag.pythonPath if you prefer a specific interpreter or virtual environment.
(~1.5 GB free disk space for the embedded model, Python environment, and generated embeddings).
GitHub Copilot Chat extension (optional, only needed for Copilot features).

Installation

Install the packaged extension

Build the project (npm install && npm run compile) or use the pre-built knowledge-rag-<version>.vsix located in the extension folder.
In VS Code open Extensions > ... > Install from VSIX... and pick the .vsix file.
Reload VS Code when prompted. The Knowledge RAG output channel appears once the extension activates.

Run from source (for development)

cd extension && npm install
Use npm run watch for incremental builds.
From VS Code, run Run > Start Debugging (F5) to launch an Extension Development Host with Knowledge RAG preloaded.

Step-by-step usage

Follow these steps the first time you set up a workspace and any time your knowledge base changes.

Open the folder that contains your knowledge base. This is the root directory you want to index. Any .knowledge-rag folder created by previous runs can stay; re-runs are incremental.
Tell Knowledge RAG which folder to use.
- Command Palette: Ctrl+Shift+P -> Knowledge RAG: Select Knowledge Base Folder.
- Explorer: right-click a folder -> Knowledge RAG > Select Knowledge Base Folder.
- The path is stored per-workspace (knowledgeRag.knowledgeBasePath), so you only have to do this once per project.
Let the extension prepare Python dependencies (first run only).
- On activation, the extension looks for Python, creates extension/.venv, and installs everything in python/requirements.txt.
- Watch the Knowledge RAG output channel for progress. Set knowledgeRag.pythonPath if you need a non-default interpreter. You can re-trigger installation anytime via Knowledge RAG: Start Knowledge Base Analysis (it validates before running).
Analyze the knowledge base.
- Command Palette -> Knowledge RAG: Start Knowledge Base Analysis.
- A progress notification tracks ingestion. Supported file types are: txt, md, markdown, pdf, ppt, pptx, doc, docx, json, yaml, yml, xml, png, jpg, jpeg, gif, bmp.
- Outputs land in <your-folder>/.knowledge-rag/ (embeddings.json, processing_tracker.json, analysis.log). The tracker automatically skips unchanged files; delete it to force a full rebuild.
Query the knowledge base inside VS Code.
- Command Palette -> Knowledge RAG: Query Knowledge Base or press Ctrl+K Q (Cmd+K Q on macOS).
- Enter a natural-language question. Results open in a Webview with similarity scores, previews, Open File buttons (jumps to the chunk location), and quick copy actions.
Use GitHub Copilot for richer answers (optional).
- Command Palette -> Knowledge RAG: Query Knowledge Base with GitHub Copilot (Ctrl+K Shift+Q). The command bundles the top matches and streams the model's answer, keeping clickable citations.
- Inside Copilot Chat you can also mention @knowledge-rag in any conversation. The participant will run the same retrieval pipeline and feed context back to Copilot.
Re-run analysis whenever source files change. The processing tracker ensures only modified files are re-embedded, so it is safe to trigger the analysis frequently.

Commands & Keybindings

Command	Description	Default keybinding
`Knowledge RAG: Select Knowledge Base Folder`	Persist the folder that should be analyzed and queried.	n/a (Command Palette / Explorer context menu)
`Knowledge RAG: Start Knowledge Base Analysis`	Run the Python ingestion/embedding pipeline and update `.knowledge-rag`.	n/a
`Knowledge RAG: Query Knowledge Base`	Prompt for a question and show ranked results in a VS Code webview.	`Ctrl+K Q` / `Cmd+K Q`
`Knowledge RAG: Query Knowledge Base with GitHub Copilot`	Retrieve context then forward the prompt + sources to Copilot.	`Ctrl+K Shift+Q` / `Cmd+K Shift+Q`

Settings

knowledgeRag.knowledgeBasePath (string, workspace scope) - Set automatically via the select-folder command, but can be edited directly in Settings/settings.json.
knowledgeRag.pythonPath (string, machine scope) - Override the interpreter used for dependency installation and script execution (e.g., C:\\Python311\\python.exe or /usr/bin/python3). Leave empty to let the extension resolve Python automatically.

Where your data lives

Inside the knowledge base folder the extension creates .knowledge-rag/ with:

embeddings.json - Vector store with chunk metadata and embedding vectors.
processing_tracker.json - Hashes + timestamps used to skip unchanged files.
analysis.log - Detailed ingestion log you can review or attach to bug reports.

Delete this folder to reset the index or export it alongside your documents to share an already-embedded knowledge base.

Troubleshooting

"Python not found." Install Python 3.8+ and/or set knowledgeRag.pythonPath. On Windows, ensure py.exe or python.exe is on PATH.
Dependency install fails. Open the Knowledge RAG output channel for the pip error, make sure you have network access for the first install (only sentence-transformers dependencies need downloading), and confirm you have ~1 GB free disk space.
Query returns no results. Verify you ran the analysis step and that the files you care about are not excluded. Check .knowledge-rag/embeddings.json to confirm embeddings exist.
GitHub Copilot integration missing. Install/enable the official GitHub Copilot Chat extension and sign in. Without it the regular Query Knowledge Base command still works.
OCR accuracy issues. Image ingestion relies on pytesseract. Install the system-level Tesseract OCR binary if you plan to embed screenshots or scans.

Development tips

Run npm run lint or npm run typecheck before bundling.
The extension entry point is src/extension.ts. Bundled output lives in dist/extension.js (via npm run bundle).
Python scripts live under python/. Use python/DEPENDENCIES.md for vendor management details, and keep large models out of Git history unless they are already vendored (see python/model).

Happy querying! Let us know if you automate new workflows so we can document them here.

Knowledge RAG

TienN-FPT