Welcome to Scrapy
What is Scrapy?
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
Features
- Simple
- Scrapy was designed with simplicity in mind, by providing the features you need without getting in your way
- Productive
- Just write the rules to extract the data from web pages and let Scrapy crawl the entire web site for you
- Fast
- Scrapy is used in production crawlers to completely scrape more than 500 retailer sites daily, all in one server
- Extensible
- Scrapy was designed with extensibility in mind and so it provides several mechanisms to plug new code without having to touch the framework core
- Portable, open-source, 100% Python
- Scrapy is completely written in Python and runs on Linux, Windows, Mac and BSD
- Batteries included
- Scrapy comes with lots of functionality built in. Check this section of the documentation for a list of them.
- Well-documented & well-tested
- Scrapy is extensively documented and has an comprehensive test suite with very good code coverage
- Healthy community
-
1,500 watchers, 350 forks on Github (link)
700 followers on Twitter (link)
850 questions on StackOverflow (link)
200 messages per month on mailing list (link)
40-50 users always connected to IRC channel (link) - Commercial support
- A few companies provide Scrapy consulting and support
Still not sure if Scrapy is what you're looking for?. Check out Scrapy at a glance.
Companies using Scrapy
Scrapy is being used in large production environments, to crawl thousands of sites daily. Here is a list of Companies using Scrapy.
Where to start?
Start by reading Scrapy at a glance, then download Scrapy and follow the Tutorial.
