Skip to content
| Marketplace
Sign in
Visual Studio Code>Other>PDF Extractor PlusNew to Visual Studio Code? Get it now.
PDF Extractor Plus

PDF Extractor Plus

kathirvel-labs

|
42 installs
| (0) | Free
Extract PDF text, batch to .txt, workspace search, rich preview, settings (encoding/line endings), OCR scanned PDF (PDF.js webview + Tesseract), and OCR from images.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

PDF Extractor Plus

PDF Extractor Plus

Extract, search, batch-export, and preview PDF text inside VS Code. Optional OCR from images (Tesseract) for scans when you export pages as PNG/JPEG.

Features

Area What you get
Extract Right-click a PDF or use the Command Palette; pick a file if nothing is selected. Opens a preview editor beside the current column.
Rich preview Webview with live search and highlight (case-insensitive). Also available from the success notification and Explorer.
Save / copy Save as .txt (with settings below), or copy full text to the clipboard.
Batch folder Right-click a folder in the Explorer → Batch Extract Folder to .txt — writes one .txt per PDF (recursive optional).
Workspace search Command Search Text in Workspace PDFs — extracts each PDF (up to a limit) and lists files containing your query; pick one to open rich preview.
Settings lineEnding, encoding (UTF-8 / UTF-16 LE + BOM), optional page separators when the PDF uses form-feed breaks (\f), default output folder, batch recursive, search max files, OCR language, and offer OCR when extract is empty.
OCR (images) Command OCR Text from Image(s) — Tesseract on PNG/JPEG/WebP/TIFF/BMP (multi-file). First run may download language data; keep network allowed if prompted.
OCR scanned PDF Command OCR Scanned PDF — renders each page in a webview with PDF.js (browser canvas), then Tesseract on the extension host. No Node canvas build; good default path for image-only PDFs. Tune pdfExtractor.ocrPdfMaxPages and pdfExtractor.ocrPdfRenderScale.
Scanned PDFs When extract finds no embedded text, the extension can offer OCR scanned PDF, OCR from images, or docs — no external page export required for the webview path.

Explorer

On a .pdf file

  • Extract Text (Preview)
  • Rich Preview (Search)
  • Extract Text to .txt File
  • Copy PDF Text to Clipboard
  • OCR Scanned PDF (render + Tesseract)

On a folder

  • Batch Extract Folder to .txt

Commands

Command ID Title
pdfExtractor.extract PDF Extractor Plus: Extract Text (Preview)
pdfExtractor.previewWebview PDF Extractor Plus: Rich Preview (Search)
pdfExtractor.extractAndSave PDF Extractor Plus: Extract Text to .txt File
pdfExtractor.copyExtracted PDF Extractor Plus: Copy PDF Text to Clipboard
pdfExtractor.batchExtract PDF Extractor Plus: Batch Extract Folder to .txt
pdfExtractor.searchWorkspacePdfs PDF Extractor Plus: Search Text in Workspace PDFs
pdfExtractor.ocrFromImages PDF Extractor Plus: OCR Text from Image(s)
pdfExtractor.ocrScannedPdf PDF Extractor Plus: OCR Scanned PDF (render + Tesseract)

Settings (pdfExtractor.*)

  • lineEnding: auto | lf | crlf
  • encoding: utf8 | utf16le
  • pageSeparatorEnabled / pageSeparatorTemplate: insert {n} between pages when \f breaks exist
  • defaultOutputFolder: absolute path for save dialog default and batch output
  • batchRecursive: include subfolders in batch
  • searchMaxFiles: cap for batch + workspace search (1–2000)
  • offerOcrWhenEmpty: show OCR hint when extract finds no text
  • ocrLanguage: Tesseract language code (e.g. eng)
  • ocrPdfMaxPages: cap pages processed for OCR scanned PDF (1–500, default 150)
  • ocrPdfRenderScale: PDF.js render scale before max-dimension fit (0.5–3, default 1.4)

Requirements

  • VS Code ^1.115.0
  • Workspace search and recursive batch need a folder opened in VS Code.
  • OCR: uses bundled Tesseract.js; first use of a language may require network for traineddata.

Developing this extension

Clone the repo, open the pdfextractor folder in VS Code, run npm install, then Run → Start Debugging (F5) to launch an Extension Development Host. Commands live in extension.js; OCR scanned PDF uses a webview (bundled pdfjs-dist from node_modules) plus tesseract.js in the Node extension host. Package with npm run vsce:package (requires @vscode/vsce).

Known limitations

  • OCR scanned PDF is CPU/memory intensive on long documents; reduce ocrPdfMaxPages / ocrPdfRenderScale if needed. Encrypted or malformed PDFs may fail in PDF.js.
  • Workspace search and batch are CPU-heavy on huge trees; tune searchMaxFiles.
  • Layout-heavy PDFs may still have imperfect text order.

Release notes

See CHANGELOG.md.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft