WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features include the fact that it is multi-thread and has distribution support. WebMagic is very easy to integrate. Add dependencies to your pom.xml. WebMagic use slf4j with slf4j-log4j12 implementation. If you customized your slf4j implementation, please exclude slf4j-log4j12. You can write a class implementation of PageProcessor.

Features

  • Simple core with high flexibility
  • Simple API for html extracting
  • Annotation with POJO to customize a crawler, no configuration
  • Multi-thread and Distribution support
  • Easy to be integrated
  • It covers the whole lifecycle of crawler

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow WebMagic

WebMagic Web Site

Other Useful Business Software
Easy-to-use Business Software for the Waste Management Software Industry Icon
Easy-to-use Business Software for the Waste Management Software Industry

Increase efficiency, expedite accounts receivables, optimize routes, acquire new customers, & more!

DOP Software’s mission is to streamline waste and recycling business’ processes by providing them with dynamic, comprehensive software and services that increase productivity and quality of performance.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of WebMagic!

Additional Project Details

Programming Language

Java

Related Categories

Java Frameworks, Java Web Scrapers

Registered

2021-06-11