DocPilot - AI-Powered PDF Assistant for VSCode
A comprehensive VSCode extension that combines advanced PDF viewing with intelligent AI summarization capabilities. View, navigate, and understand PDF documents through seamless Copilot Chat integration.
✨ Core Features
📄 Advanced PDF Viewing
- Automatic Activation - Opens PDFs seamlessly via File → Open menu
- Local & Remote Support - Open files from filesystem or URLs
- Crisp Rendering - High-quality display with PDF.js v5.3.93 engine
- Smart Navigation - Zoom, fit-to-width/page, continuous scrolling
- Professional Toolbar - Clean icon-based interface with intuitive controls
- Text Selection - Interactive text selection with dynamic visual feedback
- Enhanced Object Extraction - Extract text, images, tables, metadata, and 6 other object types
- PDF Object Inspector - Dual-mode hierarchical viewer for comprehensive document structure analysis
- Screenshot Capture - Drag-to-select screenshot tool with folder selection and save options
- Debug Mode - Developer tools for troubleshooting text layer rendering
- VSCode Integration - Seamless theme matching and responsive UI
🤖 AI-Powered Analysis
- Intelligent Summarization - Comprehensive document analysis via Copilot Chat
- Mindmap Generation - Create Mermaid mindmaps for visual document understanding
- Multi-Model Support - Works with GPT-4, Gemini, and other Copilot models
- Smart Caching - Instant results for previously processed documents
- Semantic Chunking - Advanced processing for documents of any size
- Hierarchical Processing - Multi-level summarization with context preservation
- Progress Tracking - Real-time status updates during analysis
- Automatic Cache Invalidation - Fresh summaries when files are modified
🚀 Installation
Development Mode
- Clone this repository
- Open in VSCode
- Install dependencies:
npm install
- Compile:
npm run compile
- Press
F5
to launch Extension Development Host
- Test the extension in the new window
From VSIX
📖 Usage
Opening PDFs
Automatic Activation (Easiest):
- File → Open → Select any PDF file - DocPilot opens automatically!
- Double-click PDF files in VS Code Explorer
Manual Commands:
- Press
F1
→ Type "DocPilot: Open Local PDF"
- Right-click any
.pdf
file in Explorer → "Open Local PDF"
Remote URLs:
- Press
F1
→ Type "DocPilot: Open PDF from URL"
- Enter the PDF URL when prompted
🤖 AI Chat Integration
Quick Start:
- Open Copilot Chat (
Ctrl+Alt+I
/ Cmd+Alt+I
)
- Type
@docpilot /summarise [file-path-or-url]
for text analysis
- Type
@docpilot /mindmap [file-path-or-url]
for visual mindmaps
- Get comprehensive AI analysis with document viewer
Supported Commands:
@docpilot /summarise docs/report.pdf # Local file + open viewer
@docpilot /summarise https://example.com/doc.pdf # Remote URL + open viewer
@docpilot /summarise # File picker dialog + open viewer
@docpilot /mindmap docs/report.pdf # Generate Mermaid mindmap from local file
@docpilot /mindmap https://example.com/doc.pdf # Generate mindmap from remote URL
@docpilot /mindmap # File picker dialog + generate mindmap
@docpilot /cache-stats # View cache statistics
@docpilot /clear-cache # Clear all cached summaries
Advanced Capabilities:
- 🧠 Semantic Chunking - Preserves context across document boundaries
- ⚡ Intelligent Caching - Instant retrieval of previously processed summaries
- 🔄 Hierarchical Summarization - Multi-stage analysis for comprehensive understanding
- 📊 Processing Analytics - Detailed stats on chunks processed and pages analyzed
- 🛡️ Error Resilience - Multiple fallback strategies ensure reliable operation
- 🔄 Auto Cache Invalidation - File modification detection for fresh content
Navigation Controls:
- 📄 Page Navigation: First/Previous/Next/Last page buttons with SVG icons
- 📊 Page Counter: Live page display showing current position
- 🔄 Page Input: Direct page number input for quick navigation
Zoom & Fit Controls:
- 🔍 Zoom In/Out: Precise zoom control with high-quality magnifying glass icons
- 📊 Zoom Level Display: Current zoom percentage (25% - 300%)
- 📏 Zoom Slider: Drag control for smooth zoom adjustment
- 📏 Fit Width: Automatically fit PDF width to window for optimal reading
- 📄 Fit Page: Fit entire page in window for complete overview
Content & Analysis Tools:
- 📝 AI Summarize: Intelligent document analysis via Copilot Chat integration
- 🗺️ AI Mindmap: Generate Mermaid mindmaps for visual document understanding
- 📤 Export Text: Extract PDF content as clean text files with metadata
- 👁️ Text Selection: Toggle interactive text selection with dynamic visual feedback
- 🔍 Text Search: Vi-style text search across all pages with keyboard navigation
- 📷 Screenshot Tool: Drag-to-select screenshot capture with folder selection and save options
- 🔍 PDF Object Inspector: Dual-mode hierarchical viewer for comprehensive PDF structure analysis
- 🐛 Debug Mode: Developer tools for troubleshooting text layer rendering
🗺️ AI Mindmap Generation
DocPilot now includes intelligent mindmap generation that transforms PDF documents into visual Mermaid mindmaps for enhanced understanding:
Core Features:
- 🧠 AI-Powered Analysis: Uses advanced language models to extract key concepts and relationships
- 🎨 Mermaid Format: Generates standard Mermaid mindmap syntax for universal compatibility
- 📄 Automatic File Creation: Creates
.mmd
files and opens them directly in VSCode
- 🔄 Semantic Processing: Analyzes document structure and creates hierarchical concept maps
- ⚡ Smart Caching: Cached results for previously processed documents
- 🎯 Visual Understanding: Transform complex documents into clear visual representations
How to Use:
- From Chat: Open Copilot Chat and type
@docpilot /mindmap [file-path]
- From Webview: Click the mindmap button (🗺️) in the PDF viewer toolbar
- File Picker: Use
@docpilot /mindmap
for file selection dialog
Generated Output:
- Creates a
.mmd
file with Mermaid mindmap syntax
- Automatically opens the file in VSCode for immediate viewing
- Compatible with Mermaid preview extensions
- Hierarchical structure showing document concepts and relationships
Example Output:
mindmap
root((Document Title))
Main Concept
Key Point 1
Key Point 2
Secondary Topic
Detail A
Detail B
🔍 Text Search
DocPilot now includes powerful vi-style text search functionality for quick document navigation:
Core Features:
- 📄 Cross-Page Search: Search across all pages in the PDF document
- ⌨️ Keyboard Navigation: Enter for next match, Shift+Enter for previous, ESC to close
- 🔍 Case-Insensitive: Finds matches regardless of letter case
- ⚡ Lazy Loading: Text extracted on-demand for optimal performance
- 💾 Smart Caching: Page text cached to avoid re-extraction
- 🎯 Visual Highlighting: Current match highlighted with orange outline
- 📜 Auto-Scrolling: Automatically scrolls to bring matches into view
How to Use:
- Press
Ctrl+F
(or Cmd+F
on Mac) or click the search button (🔍) in the toolbar
- Type your search term (minimum 2 characters)
- Use Enter/Shift+Enter or navigation buttons to move between matches
- Press ESC to close search
Vi-Style Experience:
- Simple, distraction-free interface with no match counters
- Immediate search as you type with smart debouncing
- Seamless integration with existing PDF navigation
📷 Screenshot Capture
DocPilot includes a powerful screenshot tool that allows you to capture specific areas of PDF documents with professional workflows:
Core Features:
- 🖱️ Drag-to-Select: Click and drag to select any rectangular area of the PDF
- 📁 Folder Selection: Choose custom save locations with persistent folder memory
- 📋 Clipboard Support: Copy screenshots directly to clipboard for immediate use
- 📄 Smart Naming: Automatic timestamped filenames with page information
- ⌨️ Keyboard Controls: ESC to cancel, intuitive modal interactions
- 🎯 Visual Feedback: Live selection rectangle with smooth overlay transitions
- 📐 Minimum Size Validation: Prevents accidental tiny selections
How to Use:
- Click the screenshot button (📷) in the toolbar or press the shortcut
- Click and drag to select the area you want to capture
- Choose between "Save to File" or "Copy to Clipboard" in the modal
- For file saving: select a folder using the browse button, then save
- For clipboard: screenshot is immediately available for pasting
Advanced Workflow:
- Folder Memory: Once you select a save folder, it's remembered for the session
- Smart File Naming: Files saved as
screenshot-page-{N}-{YYYYMMDD}-{HHMMSS}.png
- High-DPI Support: Automatically adjusts for Retina and high-resolution displays
- Canvas-Based: Captures actual PDF canvas content for perfect quality
- Multi-Page Aware: Handles complex layouts and page boundaries intelligently
🔍 PDF Object Inspector
The PDF Object Inspector transforms document analysis with a dual-mode hierarchical viewer that reveals the internal structure of PDF documents:
Object-Centric Mode:
- 🖼️ Images: All images across the document with page references
- 📊 Tables: Detected table structures with coordinate information
- 🔤 Fonts: Used fonts with page distribution and properties
- 📝 Annotations: Links, comments, and markup across pages
- 📋 Form Fields: Interactive form elements and their properties
- 📎 Attachments: Embedded files and document attachments
- 🔖 Bookmarks: Hierarchical document outline and navigation
- ⚙️ JavaScript: Document-level and page-level script actions
- 📑 Metadata: Document properties, author, creation date, etc.
Page-Centric Mode:
- 📄 Page Analysis: Objects organized by individual pages
- Progressive Loading: Batch processing for large documents (20 pages at a time)
- Object Relationships: Clear visualization of object distribution
Advanced Features:
- Lazy Loading: User-controlled scanning with "click to scan" interface
- Progressive Display: Real-time object discovery with batched results
- Shared Cache: Cross-mode efficiency with intelligent caching
- Export Capabilities: Image extraction, table CSV export, metadata JSON
- VSCode Integration: Full theme support and accessibility compliance
Accessibility & UX:
- Full Accessibility: All buttons include proper titles and ARIA attributes
- Theme Integration: Complete dark/light mode support with CSS filters
- Keyboard Shortcuts:
Ctrl/Cmd + F
to open search, Ctrl/Cmd + +/-/0
for zoom, Enter/Shift+Enter for search navigation, ESC to close search
- Mouse Controls:
Ctrl + Scroll
for zoom, natural scrolling
- Performance Awareness: Automatic warnings and optimizations for large documents
🛠️ Development
Project Structure
vscode-docpilot/
├── src/
│ ├── extension.ts # Main extension activation
│ ├── cache/ # Caching for summaries and documents
│ ├── chat/ # Handles @docpilot chat interactions
│ ├── commands/ # VS Code command definitions
│ ├── editors/ # Custom editor for PDF files
│ ├── pdf/ # Core PDF processing (text/object extraction)
│ ├── test/ # Unit, integration, and e2e tests
│ ├── types/ # TypeScript interfaces and type definitions
│ ├── utils/ # Shared utilities (logger, errors, etc.)
│ └── webview/ # Frontend code for the PDF viewer
│ ├── assets/ # Icons and other static assets
│ ├── scripts/ # Client-side JavaScript modules
│ ├── styles/ # CSS stylesheets (minified during build)
│ └── templates/ # HTML templates for the webview
├── out/ # Compiled output directory
│ └── webview/ # Bundled and minified webview assets
│ ├── scripts/ # Minified JavaScript bundles
│ ├── styles/ # Minified CSS files
│ └── assets/ # Static assets (icons, etc.)
├── rollup.config.js # Bundling configuration for webview assets
├── package.json # Extension manifest and dependencies
├── tsconfig.json # TypeScript configuration
└── README.md # This file
Key Technologies
- TypeScript - Type-safe development with modern language features
- PDF.js v5.3.93 - Mozilla's modern PDF rendering engine with ES modules
- VSCode Extension API - Deep IDE integration and Chat participant support
- Language Model API - Copilot integration for AI-powered analysis
- HTML5 Canvas - Hardware-accelerated PDF rendering
- Rollup - Modern bundling with minification for optimized webview assets
- PostCSS + cssnano - CSS optimization and minification (~21% size reduction)
- Biome - Fast linting and formatting for code quality
Build Commands
# Install dependencies
npm install
# Development build (compiles TypeScript, copies assets, bundles webview)
npm run compile
# Watch mode for development
npm run watch
# Asset management
npm run copy-assets # Copy webview assets to out/ directory
npm run bundle-webview # Bundle and minify JavaScript and CSS
# Run tests
npm run test # All tests (unit + integration)
# Run specific test suites
npm run test:unit # Unit tests only
npm run test:integration # Integration tests only
npm run test:e2e # End-to-end tests with real VS Code
# Code quality
npm run lint # Lint with Biome
npm run format # Format code with Biome
npm run check # Run Biome check (lint + format)
# Package extension (requires vsce)
vsce package
Testing
The project includes comprehensive testing infrastructure with 100% success rate:
- Unit Tests
- Integration Tests
- End-to-End Tests
- Enhanced Test Reporting: Clear unit/integration separation with performance metrics
- Test Utilities: Helper functions for PDF operations and real webview communication
- VS Code Integration: Proper extension host testing environment
End-to-End Testing:
The project includes comprehensive E2E testing using Playwright for real browser automation with VS Code Extension Development Host:
- Real VS Code Integration: Tests run in actual VS Code Extension Development Host
- Webview Testing: Complete toolbar interaction testing within PDF viewer
- User Workflow Simulation: Realistic user interactions through command palette and UI
- Visual Validation: Button visibility, accessibility attributes, and user feedback testing
- Feature-Specific Tests: Dedicated E2E tests for object extraction, screenshot capture, and toolbar functionality
Test Configuration:
playwright.config.ts
- E2E test configuration with VS Code electron support
tsconfig.e2e.json
- TypeScript configuration for E2E tests
.env
support for environment variables in E2E tests
🎯 Architecture
Unified PDF Viewing System
- WebviewProvider: Core PDF viewer with HTML generation and message handling
- PDF Object Inspector: Dual-mode hierarchical viewer for comprehensive document structure analysis
- Custom Editor: Automatic activation for File → Open, delegates to WebviewProvider
- Commands: Manual PDF opening via command palette and context menu
- WebviewUtils: Shared utilities for consistent panel creation across entry points
Multiple Activation Methods
- Automatic: File → Open on PDF files (via custom editor registration)
- Manual Commands:
docpilot.openLocalPdf
, docpilot.openPdfFromUrl
- Context Menu: Right-click on PDF files in Explorer
- Chat Integration:
@docpilot /${slash-command}
command
PDF Rendering
- Uses PDF.js for reliable cross-platform PDF parsing
- Canvas-based rendering for crisp quality at all zoom levels
- Optimized re-rendering for zoom operations
VSCode Integration
- Custom editor provider for seamless file association
- Webview panels for PDF display with theme integration
- File system access for local PDFs and URL support
Rendering:
- Throttled zoom updates and parallel page rendering
- Efficient scroll event handling with viewport optimization
AI Processing:
- Token-aware chunking with configurable overlap (10% default)
- Batch processing (3 chunks concurrently) to prevent API overload
- Memory-efficient streaming with real-time progress updates
- Intelligent caching with file modification detection
- Persistent cache storage across VS Code sessions
Asset Optimization:
- JavaScript Minification: Rollup with esbuild for optimized webview bundles
- CSS Minification: PostCSS with cssnano reduces stylesheet size by ~21%
- Modular Architecture: ES modules for efficient code splitting and loading
- Build Pipeline: Automated asset copying and bundling for production-ready distribution
🔧 Technical Highlights
Intelligent Document Processing:
- Automatic token estimation (3.5 chars/token) for accurate chunking
- Paragraph-aware semantic boundaries to preserve context
- Configurable overlap between chunks maintains narrative flow
- Multi-tier processing: single-chunk → semantic chunking → excerpt fallback
Robust Error Handling:
- Graceful degradation for oversized documents
- Comprehensive timeout management (30s for text extraction)
- Detailed error reporting with actionable feedback
⚠️ Limitations
- Initial load time increases with document size
- Very high zoom levels (>300%) may impact rendering performance
- AI summarization requires active Copilot subscription
🤝 Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
📄 License
MIT License - see LICENSE file for details