Sem

Find code by what it does, not what it's named.

Keyword search wastes time. You know the code exists, but you can't remember if you called it validateLogin(), checkCredentials(), or authenticateUser().

Sem uses AI-powered semantic search to find code by meaning. Search "check user credentials" and it finds all three. The good news is your code never leaves your computer.

How It Works

Traditional search matches text. Semantic search understands meaning.

Traditional Search:

Search: "authentication" → Only finds files with "auth" in the name

Semantic Search:

Search: "check user credentials" → Finds: validateLogin(), verifyPassword(), authenticateUser()
Search: "database connection" → Finds: initDB(), connectToPostgres(), setupPool()

Sem uses RAG (Retrieval-Augmented Generation)—the same technology powering modern AI assistants, but running entirely on your machine. No API keys, no network calls, no data leaving your computer.

Fast RAG indexing handles large codebases in under 30 seconds. Search results appear in under 100ms.

Quick Start

Open the Sem panel from the Activity Bar (left sidebar). Type natural language:

"authentication logic"
"error handling"
"database queries"

Tag code for organization:

Place cursor on any function/class
Cmd+Shift+P → "Sem: Tag Current Symbol"
Enter: #auth, #refactor, #critical

Tags sync via .vscode/code-navigator/tags.json if you commit it.

Privacy First

All AI processing happens locally:

transformers.js for embeddings (50-140MB models)
LanceDB for vector storage (<100ms searches)
Tree-sitter for AST parsing

No external API calls. No telemetry. Your code stays on your machine.

Model Selection

Choose the embedding model that fits your needs:

Model	Size	Speed	Best For
all-MiniLM-L6-v2 (default)	50MB	Fast	General purpose
bge-small-en-v1.5	60MB	Fast	Better semantics
nomic-embed-text-v1.5	140MB	Slower	Highest accuracy

Change models: Cmd+Shift+P → "Sem: Select Embedding Model"

Note: Changing models triggers a full re-index.

Performance

Indexing:

Small projects (<100 files): ~5 seconds
Medium projects (100-1,000 files): ~10-30 seconds
Large projects (1,000+ files): ~1-2 minutes

Search: <100ms for thousands of symbols

First-time setup downloads the model (~50-140MB). Models cache in ~/.cache/transformers-js/ and reuse across workspaces.

Supported Languages

TypeScript, JavaScript, Python, Go, C#, Dart

Sem uses Tree-sitter for AST parsing, so it understands code structure—functions, classes, interfaces—not just text.

Configuration

{
  "sem.search.confidenceThreshold": 0.35,
  "sem.search.maxResults": 50,
  "sem.indexing.excludePatterns": [
    "**/node_modules/**",
    "**/dist/**"
  ]
}

Confidence threshold guide:

0.15-0.35: Broad matches (exploratory)
0.35-0.70: Similar purpose, different implementation
0.70+: Very similar/duplicate code

Recommended: 0.35 for Navigator panel, 0.70 for Command Palette.

Known Trade-offs

First-time model download: 30-60 seconds
Initial indexing: 10-30 seconds for large codebases
Memory usage: ~200-500MB (varies by workspace size)
Call graph UI not yet implemented (data available, visualization pending)

Commands

Search & Navigation:

Sem: Search - Open semantic search
Sem: Reindex Workspace - Force full re-index

Tagging:

Sem: Tag Current Symbol - Tag symbol at cursor
Sem: Add Tag - Add tag to selected symbol
Sem: Remove Tag - Remove tag from symbol

Other:

Sem: Select Embedding Model - Choose AI model
Sem: View Indexing Progress - Monitor indexing status

Requirements

VS Code 1.105.0+
4GB+ RAM recommended for large workspaces
50-200MB disk space

Troubleshooting

Search returns no results:

Check the Output window under Sem
Check indexing completed (status bar shows "Indexing complete")
Lower confidence threshold to 0.15 for testing
Run Sem: Reindex Workspace

Slow indexing:

First-time model download adds 30-60 seconds
Exclude large directories: add **/vendor/** to excludePatterns
Check Sem: View Indexing Progress

Clear and rebuild:

rm -rf .vscode/code-navigator
# Reopen workspace - indexing starts automatically

Built by Nimblesite