Web Scraping CopilotWeb Scraping Copilot is a free Visual Studio Code extension by Zyte that helps you generate web scraping code with GitHub Copilot. It streamlines working with Scrapy projects and includes optional integration with Scrapy Cloud, making it easier to deploy and monitor your web scraping jobs. RequirementsVisual Studio Code 1.104+. Web scraping projects must use Python 3.10+ and Scrapy 2.6.3+. GitHub Copilot Pro or better is recommended for AI web scraping tools; the limited requests of the Free plan can run out quickly otherwise. See all requirements. Quick Start
For best code generation quality without premium models, we recommend
configuring the MCP server of the extension to use GPT-5 mini. To configure
allowed models, open the Command
Palette
( Follow the tutorial to learn more. If you run into issues, see Troubleshooting. FeaturesGenerate maintainable web scraping code with GitHub Copilot. Browse your spiders and page objects with new interactive views. Run your spiders locally with a click, generate new test fixtures for your page objects, and more. If you use Scrapy Cloud, you can deploy your spiders with a click, and monitor cloud jobs from the spiders view. FAQHow much does the extension cost?The extension in itself is free. To use code generation, you do need a GitHub Copilot plan, and the Free plan is not recommended because you would spend your requests rather quickly. To use Scrapy Cloud features, you need a Scrapy Cloud account. The free plan is fine, though. Does the extension use AI from Zyte?No, your GitHub Copilot AIs are used. The extension provides instructions and prompts, and the MCP server tools use MCP sampling to start separate chats in the background to handle the different steps of code generation. To control which models can be used by the MCP server, open the Command
Palette
( Is my code sent to Zyte?The code generation workflow that the extension facilitates does not send any code to Zyte, only to GitHub Copilot. Scrapy Cloud deployment, if used, does upload your code to Scrapy Cloud. Which LLM model works best for code generation?The model you use in the main chat should be somewhat smart, since workflow management can be hard for smaller models. We recommend something like GTP-5, although GPT-5 mini has shown good results in our tests. The MCP web scraping tools, to generate expectations and code, are designed to work well enough with models for which GitHub Copilot paid plans (Pro or better) allow unlimited requests, like GTP-5 mini. Given the number of requests that those tools can generate, it could be very costly to use a smarter model. Documentation |