PDF Xtract

A Visual Studio Code extension that converts PDF files to TXT or JSON format — with built-in OCR for scanned and vector-graphics PDFs.

Features

Convert PDF to TXT: Extract plain text content from PDF files
Convert PDF to JSON: Extract PDF content with metadata in structured JSON format
Built-in OCR: Automatically uses Tesseract OCR for PDFs without embedded text (scanned docs, "Print to PDF", etc.)
Context Menu Integration: Right-click on any PDF file in the explorer to convert
Command Palette: Access conversion commands via Command Palette (Ctrl+Shift+P)
Progress Feedback: Visual progress indicator during conversion

Usage

Right-click on any .pdf file in the VS Code Explorer
Select either:
- PDF: Convert to TXT - Extracts plain text
- PDF: Convert to JSON - Extracts text with metadata

Method 2: Command Palette

Press Ctrl+Shift+P (or Cmd+Shift+P on Mac)
Type "PDF" and select:
- PDF: Convert to TXT
- PDF: Convert to JSON
Select the PDF file you want to convert

Output Format

TXT Format

Simple plain text extraction
Preserves text content from all pages
Saved as filename.txt in the same directory

JSON Format

Structured output with metadata:

{
  "metadata": {
    "filename": "document.pdf",
    "convertedAt": "2026-02-26T...",
    "totalPages": 5,
    "info": {
      "Title": "Document Title",
      "Author": "Author Name",
      ...
    }
  },
  "content": {
    "text": "Full text content...",
    "pages": 5,
    "version": "1.7"
  }
}

Installation

From Source

Clone or download this repository
Copy the folder to your VS Code extensions directory:
- Windows: %USERPROFILE%\.vscode\extensions
- macOS/Linux: ~/.vscode/extensions
Run npm install in the extension folder
Restart VS Code

From VSIX Package (if available)

Download the .vsix file
In VS Code, go to Extensions view (Ctrl+Shift+X)
Click the "..." menu at the top
Select "Install from VSIX..."
Choose the downloaded file

Development Setup

# Install dependencies
npm install

# Run the extension in development mode
# Press F5 in VS Code to open Extension Development Host

Requirements

Visual Studio Code 1.80.0 or higher
Node.js installed for dependency management

Dependencies

pdf-parse: PDF parsing library

Known Limitations

Complex PDF layouts may not preserve exact formatting in text output
Scanned PDFs (images) require OCR and are not supported
Very large PDFs may take longer to process

Release Notes

1.0.0

Initial release
PDF to TXT conversion
PDF to JSON conversion with metadata
Context menu integration

Contributing

Feel free to submit issues and enhancement requests!

License

MIT License

PDF Xtract

AJAL R

PDF Xtract

Features

Usage

Method 1: Context Menu (Right-click)

Method 2: Command Palette

Output Format

TXT Format

JSON Format

Installation

From Source

From VSIX Package (if available)

Development Setup

Requirements

Dependencies

Known Limitations

Release Notes

1.0.0

Contributing

License