CortexMark Pipeline — VS Code Extension
A VS Code extension for running the CortexMark PDF → Markdown pipeline from inside the editor under the marketplace display name CortexMark Pipeline.
It provides:
- session-based PDF processing,
- pipeline and analysis commands,
- Markdown preview,
- dashboard metrics,
- real-time progress and logging,
- a chat-oriented control surface.
Important: this extension is the UI layer only. It does not bundle the Python backend. Install cortexmark separately.
Install
Install from the marketplace
- Open VS Code
- Open the Extensions view
- Search for CortexMark Pipeline
- Install the published extension ID:
PhiniteLab.cortexmark-pipeline-vscode
Install from VSIX
You can also install a .vsix package via:
Ctrl/Cmd + Shift + P → Extensions: Install from VSIX...
Backend requirements
The extension launches the Python backend in your workspace environment.
Minimum backend
pip install cortexmark
Recommended backend for academic PDFs
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install "cortexmark[docling]"
poppler-utils / poppler
tesseract-ocr / tesseract
These are useful for OCR-heavy or image-heavy PDF workflows, but they are not required for every project.
Requirements at a glance
| Requirement |
Required? |
Notes |
| VS Code 1.92+ |
Yes |
Minimum supported editor version |
| Python 3.11+ |
Yes |
Runs the backend |
cortexmark package |
Yes |
The extension depends on the backend CLI/modules |
docling extra |
Optional |
Needed for docling / dual workflows |
| Poppler / Tesseract |
Optional |
Helpful for OCR-style use cases |
What you can process
Primary user input inside the extension:
- one or more PDF files
- a folder containing PDFs
Outputs you can inspect after processing:
- raw Markdown
- cleaned Markdown
- chunk files
- semantic chunk files
- quality reports
- previewable Markdown pages
- dashboard summaries
First-time setup
- Open your workspace folder in VS Code
- Install the extension
- Install the Python backend in the interpreter you want the extension to use
- Run CortexMark: Environment Doctor
- If needed, run CortexMark: Setup Wizard
- Create a session
- Add PDFs or add a PDF folder
- Run CortexMark: Run Full Pipeline
Daily workflow
1. Create or select a session
Sessions keep inputs and outputs isolated. The extension stores session metadata in:
.cortexmark/sessions.json
Session-scoped pipeline artifacts live under:
sessions/<session-name>/
├── data/raw/
└── outputs/
2. Add PDFs
Use either:
- Add PDFs...
- Add PDF Folder...
3. Run commands
Useful commands include:
- Run Full Pipeline
- Convert Only
- Generate QA Report
- Run Cross-Reference Analysis
- Run Algorithm Extraction
- Run Notation Glossary
- Run Semantic Chunking
- Preview Markdown
- Refresh Dashboard
Settings
| Setting | Default | Description |
|---|---|---|
| cortexmark.pythonPath | python3 | Python executable override |
| cortexmark.configPath | configs/pipeline.yaml | Pipeline config path |
| cortexmark.dataRoot | | Optional input root override | | `cortexmark.outputRoot` | | Optional shared output root override |
| cortexmark.sessionStorePath | .cortexmark/sessions.json | Session metadata path |
| cortexmark.defaultEngine | dual | Default conversion engine |
| cortexmark.autoProcess | false | Auto-process new PDFs |
Path resolution precedence:
- explicit VS Code setting (
cortexmark.*)
- process environment variables
- workspace
.env
- selected pipeline config
- workspace-relative defaults
Commands
All commands are available from the Command Palette with the CortexMark: prefix.
Key commands:
Environment Doctor
Setup Wizard
New Session
Process Active Session
Run Full Pipeline
Convert Only
Generate QA Report
Run Cross-Reference Analysis
Run Algorithm Extraction
Run Notation Glossary
Run Semantic Chunking
Preview Markdown
Refresh Dashboard
See also:
Development
cd vscode-extension
npm install
npm run compile
Press F5 in VS Code to launch an Extension Development Host.