Skip to content
| Marketplace
Sign in
Visual Studio Code>Programming Languages>Scrape-LENew to Visual Studio Code? Get it now.
Scrape-LE

Scrape-LE

OffensiveEdge

|
46 installs
| (2) | Free
Zero Hassle Web Scrapeability Checker - Validate scraper targets before debugging
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Scrape-LE Logo

Scrape-LE: Zero Hassle Web Scrapeability Checker

Check site scrapeability in 2-5 seconds • Save 2+ hours of wasted coding
Real Browser • Screenshots • Console Errors • Anti-Bot Detection

Install from Open VSX Install from VS Code

Downloads Rating 100% Open Source Zero Critical Vulnerabilities

💡 First time? Run "Scrape-LE: Setup Browser" from Command Palette to install Chromium (~130MB one-time setup)


Scrapeability Check Demo

Command Palette


⚡ See It In Action

Before: Writing scraper code, deploying, then discovering Cloudflare blocked you (2 hours wasted)

# 2 hours of coding
scraper = MyScraper("https://example.com")
scraper.run()  # Error: Cloudflare challenge detected!

After: Check first, code later (2 seconds to validate)

✅ Site reachable
⚠️ Cloudflare detected
⚠️ Rate limit: 100 requests/hour
✅ robots.txt allows crawling
📸 Screenshot saved

Time Saved: 2 hours of wasted coding → 2 seconds of validation ⚡


✅ Why Scrape-LE?

  • 2-5 seconds to validate - vs. 30+ minutes of trial and error
  • Zero Config - Install Chromium → Press Cmd+Alt+S → Get full report
  • Battle-Tested - 207 unit tests, 87% coverage, zero critical vulnerabilities
  • Security-Hardened - 65 tests prevent command injection, shell metacharacter exploits

Perfect for validating scraper targets before writing code.


🙏 Thank You

If Scrape-LE saves you time, a quick rating helps other developers discover it:
⭐ Open VSX • VS Code Marketplace


Key Features

  • Real browser - Uses Playwright (Chromium) for accurate rendering
  • Full-page screenshots - Visual confirmation of page state
  • Anti-bot detection - Cloudflare, reCAPTCHA, hCaptcha, DataDome, Perimeter81
  • Auth detection - Login forms, OAuth, SSO, API keys
  • Rate limit detection - X-RateLimit headers, Retry-After, HTTP 429
  • robots.txt parsing - Check crawling permissions
  • Console errors - Catch JavaScript errors
  • 13 languages - English, Chinese, German, Spanish, French, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Ukrainian, Vietnamese

🚀 More from the LE Family

  • String-LE - Extract user-visible strings for i18n and validation • VS Code Marketplace
  • Numbers-LE - Extract and analyze numeric data with statistics • VS Code Marketplace
  • EnvSync-LE - Keep .env files in sync with visual diffs • VS Code Marketplace
  • Paths-LE - Extract file paths from imports and dependencies • VS Code Marketplace
  • Regex-LE - Test and validate regex patterns with live feedback • VS Code Marketplace
  • Secrets-LE - Detect and sanitize secrets before you commit • VS Code Marketplace
  • Colors-LE - Extract and analyze colors from stylesheets • VS Code Marketplace
  • URLs-LE - Extract URLs from any codebase with precision • VS Code Marketplace
  • Dates-LE - Extract temporal data from logs and APIs • VS Code Marketplace

💡 Use Cases

  • Pre-Scraper Validation - Check if sites are reachable before writing scraper code
  • Anti-Bot Detection - Identify Cloudflare, reCAPTCHA, hCaptcha before deployment
  • Rate Limit Discovery - Find rate limits before hitting them in production
  • robots.txt Compliance - Verify crawling is allowed by site policies
  • Auth Wall Detection - Check if login or paywalls block access Disallow: /admin/, /api/internal/ Crawl-delay: 10 seconds Sitemap: https://example.com/sitemap.xml

## 🚀 Quick Start

1. **Install from Open VSX or VS Code Marketplace**
   - Open VSX: [Install here](https://open-vsx.org/extension/OffensiveEdge/scrape-le)
   - VS Code Marketplace: [Install here](https://marketplace.visualstudio.com/items?itemName=nolindnaidoo.scrape-le)
2. Open Command Palette (`Cmd/Ctrl + Shift + P`).
3. Run **"Scrape-LE: Check URL"** or press `Cmd+Alt+S` / `Ctrl+Alt+S`.
4. Enter URL and view detailed results.

**Need test URLs?** Check out [`sample/README.md`](https://github.com/OffensiveEdge/scrape-le/blob/HEAD/sample/README.md) for 10 categorized test cases including static sites, SPAs, APIs, protected sites, and more.

### First-Time Setup

On first use, Scrape-LE automatically detects if Chromium is installed and prompts you to install it. This is a one-time setup (~130MB download).

**Automatic Installation:**

1. Run any check command
2. Click "Install Chromium" when prompted
3. Wait for installation to complete

**Manual Setup:**

```bash
bunx playwright install chromium

Or run from Command Palette: "Scrape-LE: Setup Browser"

⚙️ Configuration

Basic Settings

  • scrape-le.browser.timeout – Navigation timeout (5s - 120s)
  • scrape-le.browser.viewport.width – Viewport width (320px - 3840px)
  • scrape-le.browser.viewport.height – Viewport height (240px - 2160px)
  • scrape-le.screenshot.enabled – Enable screenshot capture
  • scrape-le.screenshot.path – Screenshot save location
  • scrape-le.checkConsoleErrors – Capture console errors
  • scrape-le.notificationsLevel – Control notification verbosity
  • scrape-le.statusBar.enabled – Show status bar entry

Detection Features

  • scrape-le.detections.antiBot – Detect anti-bot systems (Cloudflare, reCAPTCHA, hCaptcha, DataDome, Perimeter81)
  • scrape-le.detections.rateLimit – Detect rate limiting headers
  • scrape-le.detections.robotsTxt – Check robots.txt policies
  • scrape-le.detections.authentication – Detect authentication walls

Real-World Configuration Examples

Production Scraper Validation

{
  "scrape-le.browser.timeout": 30000,
  "scrape-le.screenshot.enabled": true,
  "scrape-le.detections.antiBot": true,
  "scrape-le.detections.rateLimit": true,
  "scrape-le.detections.robotsTxt": true,
  "scrape-le.notificationsLevel": "important"
}

Quick Reachability Check

{
  "scrape-le.browser.timeout": 10000,
  "scrape-le.screenshot.enabled": false,
  "scrape-le.detections.antiBot": false,
  "scrape-le.detections.rateLimit": false,
  "scrape-le.detections.robotsTxt": false,
  "scrape-le.notificationsLevel": "silent"
}

Development Mode

{
  "scrape-le.browser.timeout": 60000,
  "scrape-le.screenshot.enabled": true,
  "scrape-le.checkConsoleErrors": true,
  "scrape-le.detections.antiBot": true,
  "scrape-le.detections.authentication": true,
  "scrape-le.notificationsLevel": "all"
}

⚠️ Behaviors & Limits

  • Browser launch requires ~130MB Chromium installation (one-time)
  • Timeout ranges from 5s to 120s; adjust based on target site complexity
  • Screenshots saved to .vscode/scrape-le/ by default
  • Large pages may take longer to capture full screenshots
  • Anti-bot detection uses heuristics; some systems may not be detected
  • robots.txt fetch has 5-second timeout
  • Authentication detection checks HTTP status, forms, and keywords

See docs/CONFIGURATION.md.

⚡ Performance

Scrape-LE performance varies by target website and network. See detailed benchmarks.

For detailed information, see Performance Monitoring.

Scenario Page Size Duration Memory Status
Simple HTML < 100 KB < 2s < 20 MB ✅
Complex 500 KB - 1 MB 3-5s 30-50 MB ✅
Heavy JS (SPA) 1-3 MB 5-10s 50-100 MB ⚠️
Image-heavy 2-5 MB 5-15s 60-120 MB ⚠️

Browser: Launch 1-2s, screenshot 200-800ms PNG / 150-600ms JPEG
Detection: Anti-bot 85-90% accuracy (< 100ms), Rate limits 80-85% (< 50ms)
Full Metrics: docs/PERFORMANCE.md • Network-dependent performance

  • Timeout Configuration: Adjust based on target site complexity
  • Screenshot Impact: Adds 1-2s to overall check time
  • Detection Suite: Adds 500ms-2s for all checks combined

🧩 System Requirements

VS Code 1.70.0+ • Platform Windows, macOS, Linux
Memory 1GB recommended • Storage 150MB (includes Chromium)

🔒 Privacy

100% local processing. URLs only sent to sites you specify. No analytics or tracking.

🌍 Language Support

13 languages: English, German, Spanish, French, Indonesian, Italian, Japanese, Korean, Portuguese (Brazil), Russian, Ukrainian, Vietnamese, Chinese (Simplified)

🔧 Troubleshooting

"Executable doesn't exist" error?
Run "Scrape-LE: Setup Browser" from Command Palette to install Chromium

Check times out?
Increase timeout: scrape-le.browser.timeout (default 30s) or check network connection

Need help?
Check Issues or enable verbose logging: scrape-le.notificationsLevel: "all"

❓ FAQ

Need to install Chromium?
No, Scrape-LE handles it automatically on first use (~130MB download)

Works with localhost?
Yes, supports localhost, local IPs, and any accessible URL

Works with React/Vue/Angular?
Yes, uses real browser so SPAs render properly

Will sites detect this?
Uses headless Chromium which some sites detect. Use responsibly and check robots.txt

📊 Testing

207 unit tests • 87% function coverage, 91% line coverage
Powered by Vitest • Run with bun run test:coverage

Core Principle

No broken or failed tests are allowed in commits. All tests must pass before code can be committed or merged.

Test Suite Highlights

  • 65 security tests for command injection & URL validation
  • 46 detection logic tests for anti-bot, auth, rate limits, robots.txt
  • Comprehensive coverage of browser automation, screenshot capture, and error handling

For detailed testing guidelines, see Testing Guidelines.


Copyright © 2025
@OffensiveEdge. All rights reserved.

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2025 Microsoft