Similar Files Extension
A VS Code extension that shows similar files to the currently open file using BM25 search algorithm. Perfect for managing Markdown notes, documentation, and related documents.
✨ Features
This extension adds a "Similar Files" sidebar to VS Code that intelligently shows files similar to the one you're currently editing.
🎯 Core Functionality
- 📋 Similar Files Sidebar: A dedicated view in the activity bar that updates automatically
- 🔍 BM25 Search Algorithm: Uses the same algorithm as search engines for accurate content similarity
- 📊 Similarity Scores: Shows relevance scores next to each suggested file
- 🖱️ Clickable Results: Click any file in the sidebar to open it instantly
- ⚡ Real-time Updates: Index updates automatically when you save files
- ⚙️ Fully Configurable: Customize file patterns, result limits, and content filters
🛠️ Advanced Features
- 🔄 Incremental Re-indexing: Efficient updates without full rebuilds
- 🎯 Smart Filtering: Excludes current file from results
- 📁 Multi-format Support: Works with Markdown, text files, and more
- ⏱️ Performance Optimized: Debounced refresh and intelligent caching
🚀 How to Use
1. Installation & Setup
- Install the extension in VS Code
- Open a workspace with text files (Markdown works best)
- The extension will automatically index your files on activation
2. Finding Similar Files
- Open any text file in your workspace
- Look for the "Similar Files" panel in the sidebar (activity bar)
- The panel will show files similar to your currently open file
- Files are ranked by similarity score (higher = more similar)
3. Navigation
- Click any file in the Similar Files panel to open it
- Switch files in the editor to see updated similarity results
- Save changes to a file to update the similarity index
4. Customization
Open VS Code settings (Ctrl/Cmd + ,
) and search for "Similar Files":
similarFiles.maxResults
(default: 5): Number of similar files to show
similarFiles.fileGlobs
(default: ["**/*.md", "**/*.markdown"]
): File patterns to include
similarFiles.minContentLength
(default: 10): Minimum file size to index
💡 Pro Tips
- Works best with content-rich files (documentation, notes, articles)
- The more text content in your files, the better the similarity matching
- Files with similar keywords, topics, or writing style will rank higher
- Try opening different files to see how the suggestions change

Example: The Similar Files panel showing relevant documents with similarity scores
📋 Requirements
- VS Code 1.100.0 or higher
- Workspace with text files (works best with Markdown files)
- Files with meaningful text content for best similarity results
⚙️ Configuration
This extension contributes the following settings:
similarFiles.maxResults
: Number of similar files to show in the sidebar (default: 5)
similarFiles.fileGlobs
: File patterns to include in similarity search (default: ["**/*.md", "**/*.markdown"]
)
similarFiles.minContentLength
: Minimum character length for files to be indexed (default: 10)
Example Configuration
{
"similarFiles.maxResults": 8,
"similarFiles.fileGlobs": ["**/*.md", "**/*.txt", "**/*.rst"],
"similarFiles.minContentLength": 50
}
🧪 Testing the Extension
Method 1: Quick Manual Testing (Recommended for trying it out)
📋 See MANUAL_TESTING.md for a complete step-by-step testing guide!
Quick Start:
- Clone this repository and run
npm install
- Open in VS Code and press
F5
to launch Extension Development Host
- In the new window, open the
test-workspace
folder
- Open any
.md
file and check the "Similar Files" panel in the sidebar
- Click on suggested files to test navigation
Method 2: Extension Development Host (Full Development)
Clone and setup the repository:
git clone <repository-url>
cd first-extension
npm install
Open in VS Code and launch:
code .
Start debugging (F5):
- Press
F5
or go to Run > Start Debugging
- This opens a new "Extension Development Host" window with the extension loaded
Test with sample content:
- Open the
test-workspace
folder in the Extension Development Host
- Open any
.md
file (e.g., ai-ml.md
)
- Check the "Similar Files" panel in the sidebar
- Click on suggested files to test navigation
- Edit and save files to test real-time updates
Method 2: Manual Testing with Your Own Content
Prepare test content:
- Create several Markdown files with related content
- For example: project documentation, meeting notes, research files
Test scenarios:
- Open files with similar topics and verify relevant suggestions appear
- Save changes to a file and check if suggestions update
- Try different file types if you've configured custom
fileGlobs
- Test the settings by changing
maxResults
and observing the sidebar
Method 3: Automated Testing
npm run test
The test suite includes 9 comprehensive tests covering:
- Extension activation and setup
- Index building and querying
- File updates and incremental indexing
- Configuration handling
- TreeDataProvider functionality
🚀 Installation for Daily Use
From Source (Development)
- Clone this repository
- Run
npm install
and npm run compile
- Press
F5
to test in Extension Development Host
From Package (Future)
- Will be available on VS Code Marketplace
- Install via Extensions panel in VS Code
🔧 How It Works
The extension uses the BM25 algorithm (Best Matching 25) - the same algorithm used by search engines like Elasticsearch:
- 📚 Indexing Phase: When you open VS Code, the extension scans your workspace for configured file types and builds a search index
- 🔍 Query Phase: When you open or edit a file, it uses the file's content as a search query against the index
- 📊 Ranking: Results are ranked by BM25 similarity scores, showing the most relevant files first
- ⚡ Updates: The index updates incrementally when you save files, keeping suggestions current
Why BM25?
- Relevance: Considers both term frequency and document length
- Performance: Fast queries even with large document collections
- Proven: Used by major search engines and information retrieval systems
- Adaptable: Works well with both short and long documents
- Balanced: Properly handles common terms without overwhelming results
🔢 Understanding Similarity Scores
How Scoring Works
The Similar Files extension uses the BM25 algorithm (implemented via MiniSearch) to calculate similarity between documents. Here's how the scoring works:
Score Range: Scores typically range from 0 to around 10, with:
- Higher scores (e.g., 2.0+) indicating strong similarity
- Medium scores (e.g., 0.5-2.0) indicating moderate similarity
- Lower scores (e.g., <0.5) indicating minimal similarity
Score Calculation: The BM25 algorithm considers:
- Term Frequency (TF): How often important terms appear in both documents
- Inverse Document Frequency (IDF): How unique those terms are across all documents
- Document Length: Normalized by document length to avoid bias toward longer documents
Display Format: Scores are displayed as (score) filename
in the TreeView, rounded to 2 decimal places.
About MiniSearch (Our Dependency)
This extension uses MiniSearch, a small yet powerful full-text search engine that:
- Implements the BM25 ranking algorithm (same as used by Elasticsearch and Lucene)
- Provides fuzzy matching with configurable edit distance
- Has zero external dependencies
- Supports incremental indexing (essential for our file-change updates)
- Is optimized for in-memory usage in JavaScript environments
Potential Improvements
Several approaches could enhance the similarity detection in future versions:
Semantic Search Integration:
- Using embeddings from models like
@xenova/transformers
to capture semantic meaning
- Creating vector representations of documents and measuring cosine similarity
- Implementing hybrid search combining BM25 lexical matching with semantic similarity
Advanced Preprocessing:
- Adding stemming to match different forms of the same word
- Implementing stopword removal for more meaningful comparisons
- Using language-specific tokenization for international content
Alternative Algorithms:
- Okapi BM25+: An enhanced version of BM25 with better handling of term frequency saturation
- TF-IDF with SVD: Using singular value decomposition for dimension reduction
- SimHash: A technique for quickly finding similar documents using locality-sensitive hashing
Custom Weighting:
- Giving higher weight to document titles, headings, or specific sections
- Adjusting the relative importance of rare vs. common terms
- Implementing user-defined boosting for certain keywords