SI SearchA Source Insight-style code search extension for VS Code, designed for C/C++ developers working on large-scale codebases. SI Search builds a symbol index using tree-sitter parsing and stores it in an on-disk SQLite FTS5 database. This lets symbol lookup stay instant even on workspaces as large as the Linux kernel (70k+ files, tens of millions of symbols), without holding the index in the extension host's JS heap. When the index has no match for a query, SI Search falls back to ripgrep full-text search seamlessly. FeaturesSymbol Index (Sync)SI Search uses tree-sitter (WASM) to parse all C/C++ source files in your workspace and extract symbol definitions:
The index is persisted to disk as a SQLite database at Press Two-Tier Search Strategy
This hybrid approach gives you the speed of a pre-built index with the coverage of full-text search. Search Results Panel
Search Filters (Include / Exclude)Click the
Manual Highlights
Incremental File WatchingSI Search monitors your workspace for file changes:
Sync-Time Search BehaviorSearch requests that arrive while Sync is in progress can behave four different ways; configure with Architecture
Key components:
Commands
ConfigurationAll settings are under the
Example
|
| View | ID | Description |
|---|---|---|
| Search | siSearch.searchPanel |
Webview with search input, options, and search history list. |
| Highlights | siSearch.highlightsView |
Tree view showing all active manual highlights with per-item remove buttons. |
Status Bar
The status bar item (bottom-left) shows the current index state:
| State | Display | Meaning |
|---|---|---|
| None | $(database) Index: None |
No index built yet. Click to sync. |
| Building | $(sync~spin) Index: Syncing... |
Index build in progress. |
| Ready | $(database) 15,234 symbols |
Index ready with symbol count. |
| Stale | $(database) 15,234 symbols (stale) |
Files changed since last sync. Click to re-sync. |
How It Works
Symbol Parsing
SI Search loads tree-sitter WASM grammars for C and C++ at runtime. For each source file, it runs tree-sitter S-expression queries to extract symbol definitions:
;; Example: extract function names
(function_definition
declarator: (function_declarator
declarator: (identifier) @name)) @def
Each extracted symbol records: name, kind, file path, line number, column.
Large/generated C headers (above siSearch.parser.maxFileSizeBytes) bypass tree-sitter WASM and go through a regex-based streaming extractor instead. This path is memory-flat (line-by-line readline) and still feeds the SQLite index, so macros from multi-megabyte register headers are fully searchable.
SQLite FTS5 Index
The index lives in {workspaceRoot}/.sisearch/index.sqlite as a SQLite database with:
meta— schema version, created-at timestamp, workspace root, tokenizer name.files— one row per indexed file (relative_pathUNIQUE,mtime_ms,size_bytes,symbol_count). Indexed onrelative_path.symbols— one row per symbol (name,kindas enum int,file_id,line_number,column). Foreign key tofileswithON DELETE CASCADE. Indexed onfile_idandname.symbols_fts— FTS5 virtual table usingunicode61 remove_diacritics 2tokenizer. Stored ascontent=''(contentless FTS5 — only the inverted index, half the disk footprint).- Two triggers on
symbolskeep FTS5 in sync:AFTER INSERT:INSERT INTO symbols_fts(rowid, name) VALUES (NEW.id, NEW.name)AFTER DELETE:INSERT INTO symbols_fts(symbols_fts, rowid, name) VALUES ('delete', OLD.id, OLD.name)(contentless-FTS5 delete protocol).
Pragmas at open:
journal_mode = WAL crash-safe, readers don't block writers
synchronous = NORMAL WAL-safe, ~3× faster than FULL
cache_size = -65536 64 MB SQLite page cache
temp_store = MEMORY keep temp tables off disk
foreign_keys = ON enforce the CASCADE delete on files
The writer-worker additionally sets synchronous=OFF and a larger cache (-262144, 256 MB) for Sync throughput; readers keep NORMAL.
Concurrency: Writer Worker + Main-Thread Reader
SQLite write locks are exclusive, so SI Search isolates writes in a dedicated worker_threads worker (dbWriterWorker.ts) that owns the only write connection. The main thread holds a readonly connection (DbBackend with {readonly:true, fileMustExist:true}) for search queries.
During Sync:
SyncOrchestratorclassifies files (new, dirty, deleted, unchanged).- Parse jobs go through
WorkerPool; each completed batch is aParseBatchResultposted back to the main thread. onBatchResultforwards the batch toDbWriterClient.postBatch, which sends it to the worker.- The worker runs one
db.transaction(() => { insertSymbol × N; upsertFile × M; deleteFileByPath × K })()per batch. - FTS5 triggers keep the inverted index consistent inside the same transaction.
Back-pressure: DbWriterClient.awaitBackpressure(hwm=20) polls the in-flight batch counter with a 20 ms sleep when pending > 20; this keeps the worker's message queue bounded so the terminating drain/checkpoint don't sit behind a flood of batches.
At Sync end the orchestrator awaits drain() then checkpoint() (both with a 60-second safety timeout, so a stalled worker can't lock the UI on "Saving Index").
Pagination & Virtual Scrolling
executeSearchWithIndex returns {results, totalCount}:
resultsisDbBackend.search(query, options, {limit, offset})(paged, bounded tomaxResultsper page).totalCountisDbBackend.countMatches(query, options)(a singleCOUNT(*)against the same WHERE).
The webview receives a showResults for page 0 and posts loadMore as the user scrolls near the bottom. The extension replies with appendResults carrying the next page. Both messages include loadedCount / totalCount so the results panel's footer label stays consistent. When the user scrolls past the loaded region, a bottom spacer reserves the row height for not-yet-loaded rows so the scrollbar thumb reflects the true total.
Incremental Sync
During sync, SI Search compares each file's mtime_ms and size_bytes against files. Only new, modified, or deleted files are processed; deletions cascade to symbols (and, via the trigger, to symbols_fts). This makes re-syncing a large codebase (e.g., Linux kernel) take seconds rather than minutes.
Sync-Time Search UX
When a search is dispatched while SymbolIndex.isSyncInProgress() returns true, executeSearchWithIndex branches on siSearch.search.duringSyncBehavior:
cancel— return no results silently.grep-fallback— silently run ripgrep on the workspace.prompt-grep-fallback/prompt-cancel— show a VS Code information message with "Use Grep" / "Cancel" buttons. A 1-second dedup window suppresses repeats.
Resilience
- Corrupt database — at open,
DbBackend.openOrInitrunsPRAGMA quick_check. Failures move the file aside as.sisearch/index.sqlite.corrupt-<ts>and re-initialize a fresh schema. - Schema version skew — a newer on-disk schema than the extension expects throws
DbSchemaTooNewErrorwith a descriptive message. Older schemas are silently rebuilt. - Missing native addon — if
better-sqlite3fails to load (e.g. no prebuild for this platform/arch), the extension still activates:siSearch.nativeOkcontext key becomesfalse, Sync/Clear commands hide, and all searches automatically route to ripgrep. - Legacy
.sisearch/shards/— old msgpack shards from pre-SQLite builds are removed on activation.
Diagnostic Logging
Set SISEARCH_WORKER_DIAG=1 before launching VS Code to record a JSON-Lines trace of the writer worker's IPC channel (postBatch / ack / drain / checkpoint / timeout) plus every search query on the main thread. Logs go to $TMPDIR/sisearch-writer-<pid>-main.log and -worker.log. After a crash or hang:
ls -t /tmp/sisearch-writer-*.log | head -2 | xargs tail -40
The diagnostic path is a compile-time no-op when the environment variable is unset (cheap process.env read), so production use is unaffected.
Requirements
- VS Code 1.85.0 or later.
better-sqlite3native addon. Prebuilt binaries for Linux / macOS / Windows × x64 / arm64 are shipped in the VSIX; if your platform isn't covered, the extension falls back to ripgrep-only mode with a status-bar warning.- The ripgrep binary and all tree-sitter WASM grammars are bundled.
Build from Source
Prerequisites
- Node.js 18+ and npm
- VS Code 1.85.0+
- A C/C++ toolchain for
better-sqlite3to compile against (node-gyp; Python 3.8+; a workinggcc/clang/MSVC). Only needed when no prebuild matches your Node/Electron ABI.
Install Dependencies
npm install
Rebuild Native Addon for VS Code (Electron) ABI
VS Code runs extensions inside Electron, which has a different Node ABI than plain Node. Before F5 debugging or packaging:
npm run rebuild-electron
To run node-only unit tests (see below), rebuild for plain Node instead:
npm rebuild better-sqlite3
Compile
npm run compile
This runs tsc -p ./ to compile TypeScript to out/.
Watch Mode (for development)
npm run watch
Run Tests
Node-runnable unit tests:
npm run compile
npx mocha --ui tdd out/test/suite/*.test.js
Host-only integration tests (launch a VS Code instance):
npm run test:host
Package as VSIX
npx @vscode/vsce package
Install VSIX
code --install-extension sisearch-<version>.vsix
Or in VS Code: Ctrl+Shift+P → Extensions: Install from VSIX...
Known Limitations
- Symbol indexing currently supports C and C++ only. Other languages fall back to ripgrep full-text search.
- The hover preview renders code using the current VS Code theme via shiki. Some custom themes may not render perfectly.
- The
.sisearch/directory is created in the workspace root. Add it to your.gitignoreif needed. - Multi-window concurrent Sync against the same workspace is not specifically handled; SQLite's
BUSYtimeout kicks in and one window will surface an error. Sync from one window at a time. regexsearches on patterns with no literal token (e.g.,.+) fall back to scanning up to 10,000 symbol names and filtering in JS.
License
MIT