Skip to content
| Marketplace
Sign in
Visual Studio Code>Other>Facilitate InsightsNew to Visual Studio Code? Get it now.
Facilitate Insights

Facilitate Insights

corey-data

|
2 installs
| (0) | Free
Find files similar to the currently open file using BM25 search algorithm. Perfect for managing Markdown notes, documentation, and related documents.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Similar Files Extension

A VS Code extension that shows similar files to the currently open file using BM25 search algorithm. Perfect for managing Markdown notes, documentation, and related documents.

✨ Features

This extension adds a "Similar Files" sidebar to VS Code that intelligently shows files similar to the one you're currently editing.

🎯 Core Functionality

  • 📋 Similar Files Sidebar: A dedicated view in the activity bar that updates automatically
  • 🔍 BM25 Search Algorithm: Uses the same algorithm as search engines for accurate content similarity
  • 📊 Similarity Scores: Shows relevance scores next to each suggested file
  • 🖱️ Clickable Results: Click any file in the sidebar to open it instantly
  • ⚡ Real-time Updates: Index updates automatically when you save files
  • ⚙️ Fully Configurable: Customize file patterns, result limits, and content filters

🛠️ Advanced Features

  • 🔄 Incremental Re-indexing: Efficient updates without full rebuilds
  • 🎯 Smart Filtering: Excludes current file from results
  • 📁 Multi-format Support: Works with Markdown, text files, and more
  • ⏱️ Performance Optimized: Debounced refresh and intelligent caching

🚀 How to Use

1. Installation & Setup

  • Install the extension in VS Code
  • Open a workspace with text files (Markdown works best)
  • The extension will automatically index your files on activation

2. Finding Similar Files

  • Open any text file in your workspace
  • Look for the "Similar Files" panel in the sidebar (activity bar)
  • The panel will show files similar to your currently open file
  • Files are ranked by similarity score (higher = more similar)

3. Navigation

  • Click any file in the Similar Files panel to open it
  • Switch files in the editor to see updated similarity results
  • Save changes to a file to update the similarity index

4. Customization

Open VS Code settings (Ctrl/Cmd + ,) and search for "Similar Files":

  • similarFiles.maxResults (default: 5): Number of similar files to show
  • similarFiles.fileGlobs (default: ["**/*.md", "**/*.markdown"]): File patterns to include
  • similarFiles.minContentLength (default: 10): Minimum file size to index

💡 Pro Tips

  • Works best with content-rich files (documentation, notes, articles)
  • The more text content in your files, the better the similarity matching
  • Files with similar keywords, topics, or writing style will rank higher
  • Try opening different files to see how the suggestions change

Similar Files Sidebar

Example: The Similar Files panel showing relevant documents with similarity scores

📋 Requirements

  • VS Code 1.100.0 or higher
  • Workspace with text files (works best with Markdown files)
  • Files with meaningful text content for best similarity results

⚙️ Configuration

This extension contributes the following settings:

  • similarFiles.maxResults: Number of similar files to show in the sidebar (default: 5)
  • similarFiles.fileGlobs: File patterns to include in similarity search (default: ["**/*.md", "**/*.markdown"])
  • similarFiles.minContentLength: Minimum character length for files to be indexed (default: 10)

Example Configuration

{
  "similarFiles.maxResults": 8,
  "similarFiles.fileGlobs": ["**/*.md", "**/*.txt", "**/*.rst"],
  "similarFiles.minContentLength": 50
}

🧪 Testing the Extension

Method 1: Quick Manual Testing (Recommended for trying it out)

📋 See MANUAL_TESTING.md for a complete step-by-step testing guide!

Quick Start:

  1. Clone this repository and run npm install
  2. Open in VS Code and press F5 to launch Extension Development Host
  3. In the new window, open the test-workspace folder
  4. Open any .md file and check the "Similar Files" panel in the sidebar
  5. Click on suggested files to test navigation

Method 2: Extension Development Host (Full Development)

  1. Clone and setup the repository:

    git clone <repository-url>
    cd first-extension
    npm install
    
  2. Open in VS Code and launch:

    code .
    
  3. Start debugging (F5):

    • Press F5 or go to Run > Start Debugging
    • This opens a new "Extension Development Host" window with the extension loaded
  4. Test with sample content:

    • Open the test-workspace folder in the Extension Development Host
    • Open any .md file (e.g., ai-ml.md)
    • Check the "Similar Files" panel in the sidebar
    • Click on suggested files to test navigation
    • Edit and save files to test real-time updates

Method 2: Manual Testing with Your Own Content

  1. Prepare test content:

    • Create several Markdown files with related content
    • For example: project documentation, meeting notes, research files
  2. Test scenarios:

    • Open files with similar topics and verify relevant suggestions appear
    • Save changes to a file and check if suggestions update
    • Try different file types if you've configured custom fileGlobs
    • Test the settings by changing maxResults and observing the sidebar

Method 3: Automated Testing

npm run test

The test suite includes 9 comprehensive tests covering:

  • Extension activation and setup
  • Index building and querying
  • File updates and incremental indexing
  • Configuration handling
  • TreeDataProvider functionality

🚀 Installation for Daily Use

From Source (Development)

  1. Clone this repository
  2. Run npm install and npm run compile
  3. Press F5 to test in Extension Development Host

From Package (Future)

  • Will be available on VS Code Marketplace
  • Install via Extensions panel in VS Code

🔧 How It Works

The extension uses the BM25 algorithm (Best Matching 25) - the same algorithm used by search engines like Elasticsearch:

  1. 📚 Indexing Phase: When you open VS Code, the extension scans your workspace for configured file types and builds a search index
  2. 🔍 Query Phase: When you open or edit a file, it uses the file's content as a search query against the index
  3. 📊 Ranking: Results are ranked by BM25 similarity scores, showing the most relevant files first
  4. ⚡ Updates: The index updates incrementally when you save files, keeping suggestions current

Why BM25?

  • Relevance: Considers both term frequency and document length
  • Performance: Fast queries even with large document collections
  • Proven: Used by major search engines and information retrieval systems
  • Adaptable: Works well with both short and long documents
  • Balanced: Properly handles common terms without overwhelming results

🔢 Understanding Similarity Scores

How Scoring Works

The Similar Files extension uses the BM25 algorithm (implemented via MiniSearch) to calculate similarity between documents. Here's how the scoring works:

  1. Score Range: Scores typically range from 0 to around 10, with:

    • Higher scores (e.g., 2.0+) indicating strong similarity
    • Medium scores (e.g., 0.5-2.0) indicating moderate similarity
    • Lower scores (e.g., <0.5) indicating minimal similarity
  2. Score Calculation: The BM25 algorithm considers:

    • Term Frequency (TF): How often important terms appear in both documents
    • Inverse Document Frequency (IDF): How unique those terms are across all documents
    • Document Length: Normalized by document length to avoid bias toward longer documents
  3. Display Format: Scores are displayed as (score) filename in the TreeView, rounded to 2 decimal places.

About MiniSearch (Our Dependency)

This extension uses MiniSearch, a small yet powerful full-text search engine that:

  • Implements the BM25 ranking algorithm (same as used by Elasticsearch and Lucene)
  • Provides fuzzy matching with configurable edit distance
  • Has zero external dependencies
  • Supports incremental indexing (essential for our file-change updates)
  • Is optimized for in-memory usage in JavaScript environments

Potential Improvements

Several approaches could enhance the similarity detection in future versions:

  1. Semantic Search Integration:

    • Using embeddings from models like @xenova/transformers to capture semantic meaning
    • Creating vector representations of documents and measuring cosine similarity
    • Implementing hybrid search combining BM25 lexical matching with semantic similarity
  2. Advanced Preprocessing:

    • Adding stemming to match different forms of the same word
    • Implementing stopword removal for more meaningful comparisons
    • Using language-specific tokenization for international content
  3. Alternative Algorithms:

    • Okapi BM25+: An enhanced version of BM25 with better handling of term frequency saturation
    • TF-IDF with SVD: Using singular value decomposition for dimension reduction
    • SimHash: A technique for quickly finding similar documents using locality-sensitive hashing
  4. Custom Weighting:

    • Giving higher weight to document titles, headings, or specific sections
    • Adjusting the relative importance of rare vs. common terms
    • Implementing user-defined boosting for certain keywords
  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2025 Microsoft