Browse free open source PHP Web Scrapers and projects below. Use the toggles on the left to filter open source PHP Web Scrapers by OS, license, language, programming language, and project status.

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may create and pass an HttpClient instance to Goutte. For example, to add a 60 second request timeout. Read the documentation of the BrowserKit, DomCrawler, and HttpClient Symfony Components for more information about what you can do with Goutte. Goutte is a thin wrapper around the following Symfony Components: BrowserKit, CssSelector, DomCrawler, and HttpClient.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    skycaiji

    skycaiji

    Open source web scraping system for automated data collection tasks

    SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets. SkyCaiji is designed to run on a variety of hosting environments including local machines, shared hosting environments, and cloud servers. It integrates with content management systems so collected data can be published automatically without manual intervention. SkyCaiji also supports automated workflows that continuously gather data and process it based on defined collection rules. Its architecture enables users to build scalable web scraping pipelines that can run unattended once configured.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    OpenWebSpider
    OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    BTV Rename
    The goal of this project is 100% TVDB recognition and SxxExx renaming of all files generated by BeyondTV while maintaining the BeyondTV database. Currently the project relies on TVrage and scans the entire folder which is selected by the user in the conf
    Downloads: 1 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    This is simple link checker. It can crawl any site and help to find broken links. It also having download CSV report option.The CSV file includes url ,parent page url and status of page [broken or ok]. It is be very useful for search engine optimization.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    APC Anti Crawler is a php5 class based on APC which can be used to limit the amount of http request per IP. It stop web crawler to download your entire website.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Blackfire Player

    Blackfire Player

    Web Crawling, Web Testing, and Web Scraping application

    Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses. Some Blackfire Player use cases: Crawl a website/API and check expectations -- aka Acceptance Tests; Scrape a website/API and extract values; Monitor a website; Test code with unit test integration (PHPUnit, Behat, Codeception, ...); Test code behavior from the outside thanks to the native Blackfire Profiler integration -- aka Unit Tests from the HTTP layer (tm). Blackfire Player executes scenarios written in a special DSL (files should end with .bkf).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    Chaturbate-affiliate-api-script

    A minimal PHP-powered Chaturbate aggregator for your own whitelabel

    A minimal PHP-powered Chaturbate aggregator for your own whitelabel or promo site. Grid view, model profiles, and live filters. All data is local/cached. No database required. Demo: https://tinycb.com/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate with each other via gRPC (a RPC framework). Tasks are scheduled by the task scheduler module in the master node, and received by the task handler module in worker nodes, which executes these tasks in task runners. Task runners are actually processes running spider or crawler programs, and can also send data through gRPC (integrated in SDK) to other data sources, e.g. MongoDB.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 10
    Funnel is a project for use on intranets, or selected sites on the Internet to gather together and index information from several different sources and make it available through a sane, usable interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    PGBuild

    Compile your mobile web pages into mobile aps via build.phonegap.com

    PGbuild is a Phonegap development system that automates the development process by connecting your CMS/web server with the online service [Phonegap Build](http://build.phonegap.com). PGBuild is essentially a web spider that make off-line versions of web pages. The off-line version is zippped and send to the Phonegap Build service. The spider is controlled by a project file that sets the rules for the spider and the options for the phonebap build service. You may create and manage your phonegap project source files manually on your webserver or use PGBuild to connect to a CMS system to extract content. PGBuild is managed from a small widget that you may use your self or integrate into a CMS system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    PHPScraper

    PHPScraper

    A universal web-util for PHP

    PHPScraper is a universal web-scraping util for PHP, built with simplicity in mind. The goal is to make xPath Selectors optional and avoid the commonly needed boilerplate code. Just create an instance of PHPScraper, go to a website, and start collecting data. All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways. Many common use cases are covered already. You can find prepared extractors for various HTML tags, including interesting attributes. You can filter and combine these to your needs. In some cases there is an option to get a simple or detailed version. PHPScraper can assist in collecting feeds such as RSS feeds, sitemap.xml-entries and static search indexes. This can be useful when deciding on the next page to crawl or building up a list of pages on a website.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. QueryList supports common data extraction scenarios such as retrieving lists of titles, links, images, and other page elements from structured or semi-structured content. It also includes a powerful HTTP request system that enables complex operations such as simulated logins, proxy usage, and customized request headers. QueryList is designed with a modular architecture that allows developers to extend its capabilities through plugins for tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    RED HAWK

    RED HAWK

    All-in-one reconnaissance and vulnerability scanning toolkit for sites

    RED HAWK is an open source command-line security tool designed for information gathering, vulnerability scanning, and web reconnaissance tasks. It combines multiple scanning and analysis capabilities into a single toolkit to help security researchers and penetration testers quickly analyze a target website. It can collect a wide range of information about domains, servers, and web applications, including network details, hosting configuration, and content management system detection. It also provides vulnerability scanning features that help identify potential issues such as error-based SQL injection vulnerabilities and sensitive file exposure. RED HAWK includes utilities for performing DNS lookups, port scans, subdomain discovery, and reverse IP analysis, giving users a comprehensive view of a target environment. In addition to vulnerability detection, RED HAWK offers crawling features that gather links and metadata from websites to support deeper reconnaissance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Roach

    Roach

    The complete web scraping toolkit for PHP

    Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP. Roach doesn’t depend on a specific framework. Instead, you can use the core package on its own or install one of the framework-specific adapters. Currently, there’s a first-party adapter available to use Roach in your Laravel projects with more coming. Roach is built from the ground up with extensibility in mind. In fact, most of Roach’s built-in behavior works the exact same way that any custom extensions or middleware works.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Web Scraper Basic
    The Web Scraper Basic application is a PHP and MySQL powered web scraping tool. Web Scraper Basic allows the user to scrape data from websites in a nice easy to use interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Web Scraping for Laravel

    Web Scraping for Laravel

    Laravel adapter for Roach, the complete web scraping toolkit for PHP

    This is the Laravel adapter for Roach, the complete web scraping toolkit for PHP. Easily integrate Roach into any Laravel application. The Laravel adapter mostly provides the necessary container bindings for the various services Roach uses, as well as making certain configuration options available via a config file. The Laravel adapter of Roach registers a few Artisan commands to make out development experience as pleasant as possible. Roach ships with an interactive shell (often called Read-Evaluate-Print-Loop, or Repl for short) which makes prototyping our spiders a breeze. We can use the provided roach:shell command to launch a new Repl session.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Larbin is a Web crawler intended to fetch a large number of Web pages, it should be able to fetch more than 100 millions pages on a standard PC with much u/d. This set of PHP and Perl scripts, called webtools4larbin, can handle the output of Larbin and p
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    beanbun

    beanbun

    PHP multiprocess web crawling framework with distributed support

    Beanbun is a web crawling framework written in PHP that is designed to help developers build scalable and customizable web scrapers. Beanbun focuses on simplicity while providing a flexible architecture that supports complex crawling workflows. It supports both normal execution mode and daemon mode, allowing crawlers to run continuously in long-running background processes when needed. Beanbun uses a downloader component based on Guzzle and integrates with Workerman to enable multi-process crawling and efficient concurrency. It supports distributed crawling environments and multiple queue backends, including in-memory queues and Redis-based queues. Developers can customize how pages are downloaded, processed, and discovered through callback hooks and step-based crawling stages. With its extensible plugin architecture and configurable components, Beanbun allows users to implement custom queues, crawling strategies, and data processing logic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    bee-rain is a web crawler that harvest and index file over the network. You can see result by bee-rain website : http://bee-rain.internetcollaboratif.info/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler could just load actually all links it is finding (and is allowed to load according to the robots.txt file), then it would just load the whole internet (if the URL(s) it starts with are no dead end). Or it can be restricted to load only links matching certain criteria (on same domain/host, URL path starts with "/foo",...) or only to a certain depth. A depth of 3 means 3 levels deep. Links found on the initial URLs provided to the crawler are level 1 and so on.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    datalus
    PHP web API designed to simplify object handling(loading, saving, querying, displaying, and editing), abstract the data from its display structure, and layout and allow the target data to be delivered to any supported format without special logic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    diskover-community

    diskover-community

    Open source file indexing & storage analytics powered by Elasticsearch

    Diskover Community Edition is an open source file system indexing and storage analytics platform designed to help organizations understand and manage large volumes of file data. It crawls file systems and indexes metadata using Elasticsearch, enabling fast search, analysis, and organization of files stored across different storage systems. It allows administrators and users to explore file structures, monitor storage usage, and gain insights into how data is distributed across infrastructure. By indexing file metadata from sources such as local file systems, network shares like NFS and SMB, and cloud storage, the tool provides a centralized way to analyze heterogeneous storage environments. Diskover also helps identify outdated or unused files, duplicate data, and inefficient storage usage that can waste resources or increase operational costs. A Python-based indexing engine performs the scanning and indexing tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    studiMaps is a web based application for visualization and analysis of social networks. It consists of two software components: a web-crawler for getting data and the web based application for visualization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB