CortexMark Pipeline — VS Code Extension

A VS Code extension for running the CortexMark PDF → Markdown pipeline from inside the editor under the marketplace display name CortexMark Pipeline.

It provides:

session-based PDF processing,
pipeline and analysis commands,
Markdown preview,
dashboard metrics,
real-time progress and logging,
a chat-oriented control surface.

Important: this extension is the UI layer only. It does not bundle the Python backend. Install cortexmark separately.

Install

Install from the marketplace

Open VS Code
Open the Extensions view
Search for CortexMark Pipeline
Install the published extension ID: PhiniteLab.cortexmark-pipeline-vscode

Install from VSIX

You can also install a .vsix package via:

Ctrl/Cmd + Shift + P → Extensions: Install from VSIX...

Backend requirements

The extension launches the Python backend in your workspace environment.

Minimum backend

pip install cortexmark

Recommended backend for academic PDFs

pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
pip install "cortexmark[docling]"

Optional system tools

poppler-utils / poppler
tesseract-ocr / tesseract

These are useful for OCR-heavy or image-heavy PDF workflows, but they are not required for every project.

Requirements at a glance

Requirement	Required?	Notes
VS Code 1.92+	Yes	Minimum supported editor version
Python 3.11+	Yes	Runs the backend
`cortexmark` package	Yes	The extension depends on the backend CLI/modules
`docling` extra	Optional	Needed for `docling` / `dual` workflows
Poppler / Tesseract	Optional	Helpful for OCR-style use cases

What you can process

Primary user input inside the extension:

one or more PDF files
a folder containing PDFs

Outputs you can inspect after processing:

raw Markdown
cleaned Markdown
chunk files
semantic chunk files
quality reports
previewable Markdown pages
dashboard summaries

First-time setup

Open your workspace folder in VS Code
Install the extension
Install the Python backend in the interpreter you want the extension to use
Run CortexMark: Environment Doctor
If needed, run CortexMark: Setup Wizard
Create a session
Add PDFs or add a PDF folder
Run CortexMark: Run Full Pipeline

Daily workflow

1. Create or select a session

Sessions keep inputs and outputs isolated. The extension stores session metadata in:

.cortexmark/sessions.json

Session-scoped pipeline artifacts live under:

sessions/<session-name>/
├── data/raw/
└── outputs/

2. Add PDFs

Use either:

Add PDFs...
Add PDF Folder...

3. Run commands

Useful commands include:

Run Full Pipeline
Convert Only
Generate QA Report
Run Cross-Reference Analysis
Run Algorithm Extraction
Run Notation Glossary
Run Semantic Chunking
Preview Markdown
Refresh Dashboard

Settings

| Setting | Default | Description | |---|---|---| | cortexmark.pythonPath | python3 | Python executable override | | cortexmark.configPath | configs/pipeline.yaml | Pipeline config path | | cortexmark.dataRoot | | Optional input root override | | `cortexmark.outputRoot` | | Optional shared output root override | | cortexmark.sessionStorePath | .cortexmark/sessions.json | Session metadata path | | cortexmark.defaultEngine | dual | Default conversion engine | | cortexmark.autoProcess | false | Auto-process new PDFs |

Path resolution precedence:

explicit VS Code setting (cortexmark.*)
process environment variables
workspace .env
selected pipeline config
workspace-relative defaults

Commands

All commands are available from the Command Palette with the CortexMark: prefix.

Key commands:

Environment Doctor
Setup Wizard
New Session
Process Active Session
Run Full Pipeline
Convert Only
Generate QA Report
Run Cross-Reference Analysis
Run Algorithm Extraction
Run Notation Glossary
Run Semantic Chunking
Preview Markdown
Refresh Dashboard

Development

cd vscode-extension
npm install
npm run compile

Press F5 in VS Code to launch an Extension Development Host.

CortexMark Pipeline

PhiniteLab