pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking. Since pyspider has various components, you can just run pyspider to start a standalone and third service free instance. Or using MySQL or MongoDB and RabbitMQ to deploy a distributed crawl cluster. To deploy pyspider in product environment, running component in each process and store data in database service is more reliable and flexible. To deploy pyspider components in each single processes, you need at least one database service. pyspider now supports MySQL, MongoDB and PostgreSQL. You can choose one of them.

Features

  • Write script in Python
  • Powerful WebUI with script editor, task monitor, project manager and result viewer
  • MySQL, MongoDB, Redis, SQLite, Elasticsearch; PostgreSQL with SQLAlchemy as database backend
  • RabbitMQ, Beanstalk, Redis and Kombu as message queue
  • Task priority, retry, periodical, recrawl by age, etc.
  • Distributed architecture, Crawl Javascript pages, Python 2&3, etc.

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow pyspider

pyspider Web Site

Other Useful Business Software
Zenflow- The AI Workflow Engine for Software Devs Icon
Zenflow- The AI Workflow Engine for Software Devs

Parallel agents. Multi-agent orchestration. Specs that turn into shipped code. Zenflow automates planning, coding, testing, and verification.

Zenflow is the AI workflow engine built for real teams. Parallel agents plan, code, test, and verify in one workflow. With spec-driven development and deep context, Zenflow turns requirements into production-ready output so teams ship faster and stay in flow.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of pyspider!

Additional Project Details

Programming Language

Python

Related Categories

Python System Software, Python PostScript Software, Python Web Scrapers

Registered

2021-03-31