LocalRAG — Enhanced Local RAG for VS CodeFind precise answers from your files and repos using local embeddings, smart query planning, and embedded vector search. LocalRAG helps developers, knowledge workers, and enterprise teams search, summarize, review, and answer questions over local documents, repositories, and the active VS Code workspace — with privacy and compliance in mind. Use it fully offline with local Transformers.js embeddings and LanceDB storage, or enable optional LLM-based planning and evaluation via VS Code Copilot models without any external API key for advanced query decomposition and result assessment. 🙏 AcknowledgmentsThis extension is a fork of the original RAGnarōk project. We extend our heartfelt thanks to the original author hyorman and all contributors for their excellent work in creating a powerful, privacy-focused RAG solution for VS Code. Why install?
🌟 Features🧩 Local Embedding Model Support
🧠 Agentic RAG with Query Planning
🔍 Multiple Retrieval Strategies
📚 Document Processing
💾 Vector Storage
🎨 Enhanced UI
🛠️ Developer Experience
🚀 Quick StartInstallationFrom Source
From VSIX
Basic Usage0. (Optional) Choose/prepare your embedding model
1. Create a Topic
Enter name (e.g., "React Docs") and optional description. 2. Add Documents
Select topic, then choose one or more files. The extension will:
Supported formats: 2b. Add GitHub Repository
Or right-click a topic in the tree view and select the GitHub icon. You can:
The extension will recursively load all files from the repository and process them just like local documents. Note: Supports GitHub.com and GitHub Enterprise Server only. The repository must be accessible from your network. For other Git hosting services (GitLab, Bitbucket, etc.), clone the repository locally and add it as local files. 2c. GitHub Token ManagementFor accessing private repositories, LocalRAG securely stores GitHub access tokens per host using VS Code's Secret Storage API. Add a Token:
List Saved Tokens:
Shows all hosts with saved tokens (tokens themselves are never displayed). Remove a Token:
Select a host to remove its stored token. 2d. Export and Import TopicsExport a Topic:
Or select a topic in the tree view and select the export icon. This creates a portable archive containing:
Exported topics can be shared with teammates or imported into other workspaces. Import a Topic:
Or click the import icon in the tree view title bar. Select an exported topic archive to restore it into your workspace. Rename a Topic:
Or select a topic in the tree view and click the edit icon. How to Create a GitHub PAT:
Benefits:
2e. Using Common/Shared DatabasesRAGnarōk supports read-only access to shared team knowledge bases via the Setup:
Benefits:
Note: Topics from common database path appear in the tree view but cannot be deleted or modified. 3. Query with Copilot
The RAG tool will:
Maintenance CommandsClear Model Cache:
Removes cached embedding models. Useful when switching models or troubleshooting. Clear Database:
⚠️ Warning: Deletes all topics and documents. This action cannot be undone. Refresh Topics:
Reloads the topic tree view. Useful after importing topics or external changes. ⚙️ ConfigurationBasic Settings
Note: GitHub access tokens are now managed via secure Secret Storage, not settings.json. See GitHub Token Management section. Automatic Folder WatchingLocalRAG can automatically monitor a folder for changes and keep a default topic up-to-date:
Example configuration:
Benefits:
CLI Tool (
|
| Option | Short | Description |
|---|---|---|
--search |
-s |
Search indexed documents (default) |
--list |
-l |
List all topics |
--topic <name> |
-t |
Show details for a specific topic |
--status |
Show extension status | |
--json |
-j |
Output results in JSON format |
--compact |
-c |
Output compact JSON (implies --json) |
--limit <n> |
-n |
Maximum results to return (default: 10) |
--help |
-h |
Show help message |
--version |
-v |
Show version |
Output Formats
Default (Markdown): Human-readable output with colors and formatting
lrag "error handling"
JSON: Full metadata for programmatic use
lrag --json "error handling"
Compact JSON: Minimal output for AI agents (content, path, score only)
lrag --compact "error handling"
Examples
# Search with limited results
lrag -n 5 "database connection"
# Get topics in JSON format
lrag --list --json
# Check if indexing is in progress
lrag --status --compact
# Use in scripts
RESULTS=$(lrag --compact "API authentication")
echo "$RESULTS" | jq '.results[0].content'
Extension Not Running?
If the CLI can't connect to the server, it will:
- Check if the extension is installed
- Offer to install it if missing
- Provide instructions to start VS Code with your workspace
Agentic Mode Settings
{
// Enable agentic RAG with query planning
"localrag.useAgenticMode": true,
// Use LLM for query planning (requires Copilot)
"localrag.agenticUseLLM": false,
// Maximum refinement iterations (1-10)
"localrag.agenticMaxIterations": 3,
// Confidence threshold (0-1) for stopping iteration
"localrag.agenticConfidenceThreshold": 0.7,
// Enable iterative refinement
"localrag.agenticIterativeRefinement": true,
// LLM model for planning (when agenticUseLLM is true) — models provided via VS Code Copilot/LM API
"localrag.agenticLLMModel": "gpt-4o",
// Include workspace context in queries
"localrag.agenticIncludeWorkspaceContext": true
}
Set localrag.localModelPath to point at a folder that already contains compatible Transformers.js models (one subfolder per model—e.g., an ONNX export downloaded ahead of time). Entries found here appear in the tree view and can be selected directly, and this local path takes precedence over localrag.embeddingModel.
Available Embedding Models to Download (local, no API needed):
Xenova/all-MiniLM-L6-v2(default) - Fast, 384 dimensionsXenova/all-MiniLM-L12-v2- More accurate, 384 dimensionsXenova/paraphrase-MiniLM-L6-v2- Optimized for paraphrasingXenova/multi-qa-MiniLM-L6-cos-v1- Optimized for Q&A
The extension ships with Xenova/all-MiniLM-L6-v2 by default; to use other local models, set localrag.localModelPath or click the model name in tree view.
Any models you place under localrag.localModelPath show up in the tree view alongside these curated options (with download indicators) and can be loaded with one click.
LLM Models (when agentic planning is enabled): models are available via VS Code Copilot / LM API (no external API key required).
gpt-4o(default) - 0xgpt-4.1- 0xgpt-5-mini- 0xgpt-5- 1xgpt-5-codex- 1xgpt-5.1- 1xgpt-5.1-codex- 1xgpt-5.1-codex-mini- 0.33xgpt-5.1-codex-max- 1xgpt-5.2- 1xgpt-5.2-codex- 1xclaude-haiku-4.5- 0.33xclaude-sonnet-4- 1xclaude-sonnet-4.5- 1xclaude-opus-4.1- 10xclaude-opus-4.5- 3xgemini-2.5-pro- 1xgemini-3-flash- 0.33xgemini-3-pro- 1xgrok-code-fast-1- 0.25xraptor-mini- 0x
Legend: 0x = no premium requests (Copilot Free), 1x = premium requests (Copilot paid). Other values indicate premium multipliers.
🏗️ Architecture
Component Overview
┌─────────────────────────────────────────────────────┐
│ VS Code Extension │
├─────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Commands │ │ Tree View │ │ RAG Tool │ │
│ │ (UI) │ │ (UI) │ │ (Copilot) │ │
│ └─────┬───────┘ └──────┬───────┘ └─────┬──────┘ │
│ │ │ │ │
│ ┌─────┴─────────────────┴─────────────────┴──────┐ │
│ │ Topic Manager │ │
│ │ (Topic lifecycle, caching, coordination) │ │
│ └─────┬──────────────────────────────────┬───────┘ │
│ │ │ │
│ ┌─────┴─────────┐ ┌──────┴───────┐ │
│ │ Document │ │ RAG Agent │ │
│ │ Pipeline │ │ (Orchestr.) │ │
│ └┬─────────┬────┘ └┬─────────┬───┘ │
│ │ │ │ │ │
│ ┌─┴────┐ ┌──┴────┐ ┌──────┴──┐ ┌────┴───┐ │
│ │Loader│ │Chunker│ │ Planner │ │Retriev.│ │
│ │ │ │ │ │ │ │ │ │
│ └──┬───┘ └───┬───┘ └────┬────┘ └───┬────┘ │
│ │ │ │ │ │
│ ┌─┴─────────┴────┐ ┌────┴──────────┴────┐ │
│ │ Embedding │ │ Vector Store │ │
│ │ Service │ │ (LanceDB) │ │
│ │ (Local Models) │ │ (Embedded DB) │ │
│ └────────────────┘ └────────────────────┘ │
│ │
└─────────────────────────────────────────────────────┘
│
┌──────┴───────┐
│ LangChain.js │
│ (Foundation) │
└──────────────┘
🎯 How It Works
Agentic Query Flow
User Query: "Compare React hooks vs class components"
↓
┌───┴────────────────────────────────────────┐
│ 1. Topic Matching (Semantic Similarity) │
│ → Finds best matching topic │
└───┬────────────────────────────────────────┘
↓
┌───┴────────────────────────────────────────┐
│ 2. Query Planning (LLM or Heuristic) │
│ Complexity: complex │
│ Sub-queries: │
│ - "React hooks features and usage" │
│ - "React class components features" │
│ Strategy: parallel │
└───┬────────────────────────────────────────┘
↓
┌───┴────────────────────────────────────────┐
│ 3. Hybrid Retrieval (for each sub-query) │
│ Vector search: 70% weight │
│ Keyword search: 30% weight │
│ → Returns ranked results │
└───┬────────────────────────────────────────┘
↓
┌───┴────────────────────────────────────────┐
│ 4. Iterative Refinement (if enabled) │
│ Check confidence: 0.65 < 0.7 │
│ → Refine query and retrieve again │
│ Check confidence: 0.78 ≥ 0.7 ✓ │
└───┬────────────────────────────────────────┘
↓
┌───┴────────────────────────────────────────┐
│ 5. Result Processing │
│ - Deduplicate by content hash │
│ - Rank by score │
│ - Limit to topK │
└───┬────────────────────────────────────────┘
↓
Return: Ranked results with metadata
Document Processing Flow
User uploads: document1.pdf, document2.md
↓
┌───┴────────────────────────────────────────┐
│ 1. Document Loading (LangChain Loaders) │
│ PDF: PDFLoader │
│ MD: TextLoader │
│ HTML: CheerioWebBaseLoader │
│ → Returns Document[] with metadata │
└───┬────────────────────────────────────────┘
↓
┌───┴────────────────────────────────────────┐
│ 2. Semantic Chunking │
│ Strategy selection: │
│ - Markdown: MarkdownTextSplitter │
│ - Code: RecursiveCharacterTextSplitter │
│ - Other: RecursiveCharacterTextSplitter │
│ → Preserves headings and structure │
└───┬────────────────────────────────────────┘
↓
┌───┴────────────────────────────────────────┐
│ 3. Embedding Generation (Batched) │
│ Model: Xenova/all-MiniLM-L6-v2 (local) │
│ Batch size: 32 chunks │
│ → Generates 384-dim vectors │
└───┬────────────────────────────────────────┘
↓
┌───┴────────────────────────────────────────┐
│ 4. Vector Storage │
│ LanceDB embedded database │
│ → Stores embeddings + metadata │
└───┬────────────────────────────────────────┘
↓
Complete: Documents ready for retrieval
📊 Performance
Benchmarks (M1 Mac, 16GB RAM)
| Operation | Time | Notes |
|---|---|---|
| Load PDF (10 pages) | ~2s | Using PDFLoader |
| Chunk document (50 chunks) | ~100ms | Semantic chunking |
| Generate embeddings (50 chunks) | ~3-5s | Local Transformers.js model |
| Store in LanceDB | ~100ms | File-based persistence |
| Hybrid search (k=5) | ~50ms | Vector + BM25 |
| Query planning (LLM) | ~2s | GPT-4o via Copilot |
| Query planning (heuristic) | <10ms | Rule-based |
Optimization Tips
- Use local embeddings for privacy and no API costs
- Enable agent caching (automatic per topic)
- Adjust chunk size based on document type
- Use simple mode for fast queries
- Batch document uploads for efficiency
- LanceDB scales well - no size limits like in-memory stores
🔬 Testing
Run Tests
npm test
🤝 Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Development Setup
git clone https://github.com/borgius/localrag.git
cd localrag
npm install
npm run watch # Watch mode for development
📄 License
MIT License - see LICENSE for details
🙏 Acknowledgments
Built with:
- LangChain.js - Document processing framework
- Transformers.js - Local embeddings
- LanceDB - Embedded vector database
- VS Code Extension API - Extension platform
- VS Code LM API - Copilot integration
Made with ❤️ by the hyorman
⭐ Star us on GitHub if you find this useful!