经典爬虫框架 — Python

  • Scrapy – A fast high-level screen scraping and web crawling framework.
  • pyspider – A powerful spider system.
  • cola – A distributed crawling framework.
  • Demiurge – PyQuery-based scraping micro-framework.
  • Scrapely – A pure-python HTML screen-scraping library.
  • feedparser – Universal feed parser.
  • you-get – Dumb downloader that scrapes the web.
  • Grab – Site scraping framework.
  • MechanicalSoup – A Python library for automating interaction with websites.
  • portia – Visual scraping for Scrapy.
  • crawley – Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.
  • RoboBrowser – A simple, Pythonic library for browsing the web without a standalone web browser.
  • MSpider – A simple ,easy spider using gevent and js render.
  • brownant – A lightweight web data extracting framework.
  • PSpider – A simple spider frame in Python3.
  • Gain – Web crawling framework based on asyncio for everyone.
  • sukhoi – Minimalist and powerful Web Crawler.
  • spidy – The simple, easy to use command line web crawler.
  • newspaper – News, full-text, and article metadata extraction in Python 3

转载自: https://github.com/BruceDone/awesome-crawler

发表评论

电子邮件地址不会被公开。 必填项已用*标注