Lynx File Content Search

Bring Your Own Model. Connect Your Own Data.

Lynx builds a private, local index of the folders and drives you choose — then hands it to your coding agent (Claude Code, Codex, or Gemini CLI) so the agent can search and act on your files through one fast query, instead of scanning the disk and burning tokens. It writes the AGENTS.md / CLAUDE.md / GEMINI.md maps and ready-to-run query tools your agent auto-loads.

Two principles from Lynx DI run through it:

BYOM — Bring Your Own Model. Your client, your conversation, your credentials — none of it ever traverses a Lynx server (there is none). Connect the AI agent you already use, and when Anthropic, OpenAI, or Google ship a smarter model, your assistant gets smarter the same day.
CYOD — Connect Your Own Data. Plug in your own folders, drives, and data sources and index them straight to your machine — names, metadata, and full content, on your schedule, fully under your control. Nothing is ever uploaded.

It's also a first-class standalone full-content search for VS Code: file names, paths, and — optionally — file contents, indexed locally with a pluggable search layer (SQLite + FTS5 out of the box, an alternate Tantivy engine, and a designed-in semantic/vector engine). Use it as a blazing-fast file/path index for huge trees (like Everything), or search inside code, notes, CSV, PDF, Office, and HTML. Everything runs locally, inside VS Code — no telemetry, no server, no uploaded content.

Highlights

Bring your own AI agent — it drives the actions. Generates AGENTS.md / CLAUDE.md / GEMINI.md maps plus ready-to-run query tools in .index/bin/, so Claude Code, Codex, or Gemini CLI search and act on your index with one command — no hosted model, no tokens wasted on brute-force file reads.
Bring your own data, kept local. Index any folders or drives. The extension is read-only to your files (never moves, modifies, or deletes them) and nothing ever leaves your machine.
Two indexes, one click apart. A fast file-metadata index (names, paths, size, dates, type) and a full content index live in separate databases, so you can build metadata fast, prune what you don't need, then build content — and switch which one search uses at any time.
Parallel, incremental crawler. A shard-and-merge crawler using worker threads indexes millions of files in minutes, then re-crawls only what changed.
Content extraction that degrades gracefully. Code and text are read directly (no Python). PDF/DOCX/PPTX/XLSX/HTML go through Microsoft MarkItDown; if it isn't set up, those files are still indexed by name and metadata.
Pluggable search, multiple engines. SQLite + FTS5 by default, an alternate Tantivy engine for hands-on benchmarking, and a semantic/vector engine designed into the architecture (experimental) — all local, in VS Code, no server.
Never activates uninvited. It stays completely dormant — creating no files — until you configure roots or run a build. Opening an unrelated folder does nothing.

Requirements & Dependencies

Dependency	Needed for	How it's installed
VS Code 1.102.0+	The extension itself	Marketplace / VSIX
SQLite + FTS5 (`better-sqlite3`)	The core index and search	Bundled with the extension — no action needed
Python 3.10+ (with the `venv` module)	Document extraction (MarkItDown) and the Tantivy engine	You install a base Python once (see below); the extension builds its own isolated environment from it
MarkItDown	Extracting PDF / DOCX / PPTX / XLSX / XLS / Outlook / HTML to text	Installed automatically by Set Up Document Extraction (MarkItDown)
Tantivy	The optional alternate search engine	Installed automatically by Build Tantivy Index

You only need Python if you want to full-text index documents (PDF/Office/HTML) or try the Tantivy engine. File/path indexing and content indexing of code and text need nothing beyond the extension.

Installing base Python (one time)

MarkItDown and Tantivy run inside a Python virtual environment the extension creates for you, but that needs a base Python 3.10 or newer with the standard venv module.

Windows / macOS: install from python.org/downloads. On Windows, tick “Add python.exe to PATH” in the installer.
Linux: most distros ship Python 3; ensure the venv module is present (Debian/Ubuntu: sudo apt install python3-venv).

Verify it's discoverable from a terminal:

python3 --version    # or:  py --version   /   python --version

The extension auto-discovers py, python3, and python on your PATH. If your interpreter lives elsewhere, set diskIndex.pythonPath to its full path.

Setting up the local Python environment

The extension keeps its Python environment project-local so it survives extension updates and never pollutes your system Python:

Run Lynx File Content Search: Set Up Document Extraction (MarkItDown) from the Command Palette (or click the MarkItDown node in the sidebar when it shows unavailable).
The extension finds a base Python 3.10+, creates a virtual environment at .index/py-env, and installs markitdown[pdf,docx,pptx,xlsx,xls,outlook] into it.
Progress streams to the Output panel. When it finishes, document extraction is ready — no restart needed.

To add the Tantivy engine, run Build Tantivy Index. It reuses the same .index/py-env (creating it first if needed), installs the tantivy package, and builds a Tantivy index from your content DB into .index/tantivy/.

Everything is installed into .index/py-env — nothing is installed system-wide, and deleting .index removes it cleanly. To reuse an interpreter that already has markitdown installed, set diskIndex.pythonPath and skip the setup command.

Quickstart

Install the extension and open the Lynx File Content Search activity-bar view.
Run Configure Roots to Index and choose the folders or drives you want searchable. (This is the moment the .index folder is created — not before.)
Click Build Index. Leave Index content off for the fastest file/path index, or turn it on to full-text index file contents.
Search from the panel, or run Quick Search the Index from the Command Palette.
(Optional) Run Set Up Document Extraction (MarkItDown) to also index PDF/Office/HTML content, then rebuild.

By default the index lives in a .index folder inside the opened workspace. To keep it elsewhere (for example, off a workspace or on a faster drive), point diskIndex.databasePath at an empty folder before building.

The two indexes: metadata vs content

Lynx keeps file metadata and file content in separate databases so they coexist:

File metadata (index.db) — names, paths, size, dates, type, extension. Fast to build (no reading file bodies), ideal for “where is that file?” across millions of files.
Full content (content.db) — everything above plus extracted, full-text-searchable file contents.

The Search index section in the sidebar shows both, marks which is active (the one search and browse use), and lets you switch with a click. Build Index targets metadata or content depending on the Index content toggle, and Switch Search Index flips the active one. Building content never overwrites your metadata index, and vice versa.

What gets indexed

Lynx always indexes file names, paths, size, dates, type, and extension for eligible files.

When content indexing is on:

Code, plain text, Markdown, JSON, XML, CSV, TSV are read and indexed directly — no Python required.
PDF, DOCX, PPTX, XLSX, XLS, Outlook .msg, HTML are extracted with MarkItDown (needs the Python environment above). Without it, they’re indexed by name and metadata only.

Search engines

Lynx's search layer is pluggable — the index isn't tied to a single engine, and every option runs locally and in‑process (no server):

SQLite + FTS5 (default, built in) — full-text search ranked with BM25. Powers both the extension's own Search panel and the .index/bin/search.py agent tool. Nothing to install.
Tantivy (alternate) — a Rust/Lucene-style index built from your content DB via Build Tantivy Index. Today it's exposed to coding agents through .index/bin/search_tantivy.py for A/B benchmarking against FTS5; the extension's own UI stays on FTS5.
Semantic / vector (experimental, not yet enabled) — the architecture includes a search-provider seam and a --mode hybrid flag for embedding-based retrieval (designed for sqlite-vec + a local embedding model with reciprocal-rank fusion). No embedding model ships yet, so hybrid currently falls back to lexical results — it's designed-for, not on by default.

Bring your own AI agent

This is the heart of Lynx: you connect the agent you already use, and it drives the work over your index. Run Generate Agent Map (it also runs after each build) to write AGENTS.md, CLAUDE.md, and GEMINI.md (byte-identical) at your workspace root, plus a .index/bin/ toolbox:

python .index/bin/search.py "<query>" — ranked, deduped, line-numbered content snippets in one call (--db metadata|content).
python .index/bin/query.py "<SQL>" — read-only SQL against the index for metadata questions.
python .index/bin/search_tantivy.py "<query>" — the same ranked shape, served by the Tantivy engine.

Coding agents (Claude Code, Codex, Gemini CLI) auto-load the map and use these instead of scanning files — faster answers, far fewer tokens.

Commands

Command	What it does
Configure Roots to Index	Choose the folders or drives that become searchable. Creates `.index` on first use.
Build Index (Configured Roots)	Build or rebuild the index for the configured roots.
Reset Index (Purge & Rebuild)	Delete the derived index data and rebuild from the selected roots. Your source files are never touched.
Switch Search Index (metadata / content)	Choose whether search and browse use the metadata or the content index.
Build Tantivy Index	Install `tantivy` into the local venv and build the alternate Tantivy engine.
Index a Folder…	Index a one-off folder outside the configured roots.
Index a Document…	Pick individual files to index.
Reindex Files With Failed Extraction	Retry files whose content extraction previously failed.
Quick Search the Index…	Search from the Command Palette.
Open Search Panel	Open the search and browse UI.
Generate Agent Map (AGENTS.md / CLAUDE.md)	Write agent-readable maps and the `.index/bin/` query tools.
Set Up Document Extraction (MarkItDown)	Create the local Python environment for document extraction.
Reveal Index Database Location	Open the folder containing the index files.
Show Logs	Open the extension output logs.

Settings

Setting	Description	Default
`diskIndex.roots`	Folders or drives to index.	Empty
`diskIndex.includeExtensions`	Optional extension allowlist. Empty means every file is considered.	Empty
`diskIndex.metadataOnly`	Fast file/path indexing. Set to `false`, or use Index content during a build, for full-text content indexing.	`true`
`diskIndex.contentMaxSizeMB`	When content indexing is on, only files up to this many MB have their text extracted; larger files index by name/metadata. `0` disables extraction.	`4`
`diskIndex.contentMaxChars`	Full-text depth — max characters indexed per file. `0` = unlimited (index the entire file). Raise `contentMaxSizeMB` and keep this `0` to fully index very large files.	`0`
`diskIndex.databasePath`	Folder holding `config.json`, the index DBs, and `lynx-index.log`. Use an empty folder for a new index.	Workspace `.index`, else extension storage
`diskIndex.indexWorkers`	Number of parallel index workers. `0` auto-scales (8 metadata / 8 content); set 1–64 to override.	`0`
`diskIndex.excludeDirs`	Folder names or absolute paths to skip during indexing (with their subtrees).	Empty
`diskIndex.pythonPath`	Optional Python 3.10+ interpreter with `markitdown` installed. Overrides the managed venv.	Empty
`diskIndex.logLevel`	Log verbosity: `debug`, `info`, `warn`, or `error`.	`info`

Search engines

The in-app Search and Filename Browse tabs always use the built-in SQLite + FTS5 engine — nothing to enable.

Tantivy is an optional second full-text engine, built from your content index, for your AI agent to query (and for hands-on retrieval benchmarking). To build it, open the Index Status tab, tick Index content and Tantivy engine, then Build Index — or click Tantivy in the sidebar. It indexes file contents, so a content index must exist first (a metadata-only index holds only filenames).

Once built, your agent searches it exactly like the default engine, through a separate tool that returns the same ranked, line-numbered JSON:

python .index/bin/search.py          "<query>" -k 8   # built-in FTS5 engine
python .index/bin/search_tantivy.py  "<query>" -k 8   # Tantivy engine (same JSON shape)

Benchmark it in the app: the Search tab has an Engine dropdown — leave it on FTS5 (built-in) for everyday, live-as-you-type search, or switch to Tantivy and press Enter to run the same query through the Tantivy engine and compare rankings side by side. (Tantivy runs a Python process per query, so it's Enter-to-search, not live.)

Privacy & storage

Lynx File Content Search stores a rebuildable local index (index.db and/or content.db), a small config.json, a lynx-index.log, and — if you set up extraction — a py-env virtual environment, all under .index (or your diskIndex.databasePath).

Nothing is created until you adopt a folder for indexing (Configure Roots / Build / Index a Folder).
Extracted text and snippets are stored only in the local database so search is instant.
No indexed content is uploaded and nothing is sent to a model. The MarkItDown and Tantivy setup commands download Python packages from PyPI; indexing and search themselves are fully local.
The extension reads files but never moves, modifies, or deletes them. Duplicate detection only surfaces duplicates.

Troubleshooting

Documents don’t match on content. Run Set Up Document Extraction (MarkItDown) and confirm a base Python 3.10+ with venv is on your PATH (or set diskIndex.pythonPath). Until then, documents index by metadata only.
“Build Tantivy Index” fails. It needs the same Python 3.10+ base as MarkItDown. Check the Output panel for the pip install tantivy log.
Search feels incomplete after adding files. Run Build Index again to pick up new or changed files.
Searching the wrong thing. Check the Search index section in the sidebar — you may be on the metadata index when you want content (or vice versa). Use Switch Search Index.
Indexing is slow in cloud-synced or network folders. Add slow subtrees to diskIndex.excludeDirs.
Need to inspect the index files. Run Reveal Index Database Location or check diskIndex.databasePath.

More from Lynx DI

Lynx File Content Search is built by Lynx DI. Explore more at lynxdi.com.

License

MIT