Headless Chrome Crawler download

Crawlers based on simple requests to HTML files are generally fast. However, it sometimes ends up capturing empty bodies, especially when the websites are built on such modern frontend frameworks as AngularJS, React and Vue.js. Powered by Headless Chrome, the crawler provides simple APIs to crawl dynamic websites. Support both depth-first search and breadth-first search algorithm. Save screenshots for the crawling evidence, emulate devices and user agents, priority queue for crawling efficiency, obey robots.txt, and more. The static crawlers are based on simple requests to HTML files. They are generally fast, but fail scraping the contents when the HTML dynamically changes on browsers. Dynamic crawlers based on PhantomJS and Selenium work magically on such dynamic applications. However, PhantomJS's maintainer has stepped down and recommended to switch to Headless Chrome, which is fast and stable. This crawler is dynamic and based on Headless Chrome.

Features

Distributed crawling
Configure concurrency, delay and retry
Pluggable cache storages such as Redis
Support CSV and JSON Lines for exporting results
Pause at the max request and resume at any time
Insert jQuery automatically for scraping

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Headless Chrome Crawler

Headless Chrome Crawler Web Site

Other Useful Business Software

Gemini 3 and 200+ AI Models on One Platform

Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free

Rate This Project

User Reviews

Be the first to post a review of Headless Chrome Crawler!

Additional Project Details

Programming Language

JavaScript

Related Categories

JavaScript Software Distribution Software

Registered

2021-10-29

Similar Business Software

Wave Browser

Wave Browser is an efficient, eco-conscious browser that creates a cleaner, more organized, and more meaningful online experience while helping remove ocean plastic through its partnership with 4ocean. Built on the trusted Chromium foundation, Wave Browser brings essential tools directly into...

See Software
SureSync

SureSync Pro is a file replication and synchronization application that provides one-way and multi-way processing in both scheduled and real-time modes. The Communications Agent provides real-time monitors, delta copies via Remote Differential Compression, TCP communications, compression, and...

See Software
Files.com

Files.com is a cloud-native Managed File Transfer (MFT) platform that unifies file transfers, sharing, and automation across any cloud, protocol, or partner. It connects 50+ storage systems — including Amazon S3, Azure, Google Drive, SharePoint, Dropbox, and Box — presenting them as a single...

See Software
Comet Backup

Start running backups and restores in less than 15 minutes! Fast, secure backup software for businesses and IT providers. Comet is a flexible, all-in-one backup platform available in 13 languages. You choose your backup destination, server location, configuration and setup. Backup to your...

See Software
Synchredible

Synchredible allows users to easily synchronize, copy, and backup individual folders or entire drives with just one click. Our intuitive assistant guides you through defining tasks that can be scheduled, triggered by changes (real-time monitoring), or executed when connecting an external storage...

See Software
Diplomat Managed File Transfer

Diplomat MFT by Coviant Software is a secure, reliable managed file transfer solution designed to simplify and automate SFTP, FTPS, and HTTPS file transfers. Built for seamless integration, Diplomat MFT works across major cloud storage platforms, including AWS S3, Azure Blob, Google Cloud,...

See Software

Report inappropriate content

Headless Chrome Crawler

Distributed crawler powered by Headless Chrome

Get an email when there's a new version of Headless Chrome Crawler

Features

Project Samples

Project Activity

Categories

License

Follow Headless Chrome Crawler

User Reviews

Additional Project Details

Programming Language

Related Categories

Registered