Showing 59 open source projects for "web spider"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 1
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    xhs-spider

    xhs-spider

    Desktop tool for collecting and exporting Xiaohongshu post data

    XHS-Spider is a desktop data collection tool designed to gather content and metadata from the Xiaohongshu platform. It provides a graphical interface that allows users to explore posts, collect information, and download media such as images and videos from individual notes or search results. It was developed primarily as a learning project to demonstrate approaches to building web crawlers and experimenting with technologies such as WebView2 and WPF UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based jobs. feapder supports features such as breakpoint resume, allowing crawlers to continue from where they stopped without losing progress. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    ...The API is built on top of urllib3 and lxml libraries. The Spider API to build asynchronous web crawlers. You write classes that define handlers for each type of network request. Each handler is able to spawn new network requests. Network requests are processed concurrently with a pool of asynchronous web sockets. Grab provides interface called Spider to develop multithreaded web-site scrapers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    ...Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    python-fxxk-spider

    python-fxxk-spider

    Collection of 100+ Python web scraping projects and crawler examples

    python-fxxk-spider is a curated collection of Python web scraping and crawler projects gathered in a single repository for reference and learning. It aggregates many independent scraping examples that target a wide range of websites, online services, and public data sources. Instead of being a single crawler tool, it functions as a catalog of ready-made Python spider implementations that demonstrate different scraping techniques. python-fxxk-spider includes scrapers for social media, e-commerce platforms, job listings, music services, video platforms, and various content sites. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12

    ahCrawler

    A PHP search engine for your website and web analytics tool. GNU GPL3

    ...Trigger a rescan whenever you want - you always have under control what data of what time were checked. The spider is a CLI tool and must be added as a cronjob. In a web based backend you can control all data and analyze your data. You can handle multiple websites in the same backend. PHP 7 or 8 + PDO (Mysql/ Sqlite)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Easyspider - Distributed Web Crawler

    Easyspider - Distributed Web Crawler

    Easy Spider is a distributed Perl Web Crawler Project from 2006

    Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing Software: https://www.artikelschreiber.com/en/ https://www.unaique.net/en/ https://www.unaique.com/ https://www.artikelschreiben.com/ https://www.buzzerstar.com/ https://easyperlspider.sourceforge.io/ https://www.sebastianenger.com/ https://www.artikelschreiber.com/opensource/ It is fun to look at some code that is few years ago and to see how one has improved himself. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    crawly

    crawly

    High-level web crawling and scraping framework for Elixir apps

    Crawly is a high-level application framework for crawling websites and extracting structured data using the Elixir programming language. It provides a complete environment for building web crawlers that systematically visit pages, collect information, and transform that data into structured formats for further processing. Crawly is designed for tasks such as data mining, information processing, and building historical archives of web content. Crawly follows the Elixir and OTP architecture...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    rubywebcrawler

    web spider software written in ruby

    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    GoSpider

    GoSpider

    Gospider - Fast web spider written in Go

    GoSpider - Fast web spider written in Go. Fast web crawling. Brute force and parse sitemap.xml. Parse robots.txt. Generate and verify link from JavaScript files. Link Finder. Find AWS-S3 from response source. Find subdomains from the response source. Get URLs from Wayback Machine, Common Crawl, Virus Total, Alien Vault. Format output easy to Grep. Support Burp input.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    ruia

    ruia

    Async Python framework for fast and flexible web scraping spiders

    Ruia is an asynchronous web scraping micro-framework built for Python that focuses on simplicity, speed, and flexibility when creating web crawlers. Ruia is powered by Python’s asyncio library along with aiohttp, enabling developers to perform concurrent network requests efficiently and scrape data from websites with minimal overhead. Ruia follows a “write less, run faster” philosophy, emphasizing concise code and streamlined spider development.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    GOPA

    GOPA

    GOPA, a spider written in Golang, for Elasticsearch

    GOPA, a spider written in Golang, for Elasticsearch. Lightweight, low footprint, memory requirement should, be 100MB. Easy to deploy, no runtime or dependency required. Easy to use, no programming or script ability needed, out-of-box features. First of all, get it, two opinions: download the pre-built package or compile it yourself. Besides Elasticsearch, Gopa doesn't require any other dependencies, just simply run ./gopa to start the program. It's safety to press ctrl+c to stop the current...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    pyspider

    pyspider

    A powerful Spider(Web Crawler) system in Python

    pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    OpenWebSpider
    OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    Site monitoring

    Site monitoring

    Monitoring of websites with spider and email notifications

    Free website monitoring software, easy to set up and use for monitoring web sites. It is a web application programmed in Java programming language. You can monitor HTML pages, JSON and XML, pages in sitemap and even your whole web site using spider. Naturally you can check multiple websites. You can check HTTP result codes and even contents of the checked pages. Website checking is done periodically using build-in cron mechanism.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    sitecheck

    Modular web site spider for web developers.

    More than just a link checker, sitecheck is a website spider (also known as a crawler) which can assist with SEO by testing an entire site plus both inbound links from search engines and outbound links to other sites for the following issues: looping redirects (HTTP 301/302), broken links (HTTP 404), server errors (HTTP 500), spelling mistakes, low readability scores (using the Flesch Reading Ease test), missing/empty/duplicate meta tags, duplicate content, slow page speed, W3C validation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB