Showing 104 open source projects for "internet dump spider"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • 1
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    xhs-spider

    xhs-spider

    Desktop tool for collecting and exporting Xiaohongshu post data

    XHS-Spider is a desktop data collection tool designed to gather content and metadata from the Xiaohongshu platform. It provides a graphical interface that allows users to explore posts, collect information, and download media such as images and videos from individual notes or search results. It was developed primarily as a learning project to demonstrate approaches to building web crawlers and experimenting with technologies such as WebView2 and WPF UI. It supports multiple ways to locate...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 12 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DB Browser for SQLite

    DB Browser for SQLite

    The DB Browser for SQLite

    DB Browser for SQLite (DB4S) is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite. DB4S is for users and developers who want to create, search, and edit databases. DB4S uses a familiar spreadsheet-like interface, and complicated SQL commands do not have to be learned. This program is not a visual shell for the sqlite command line tool, and does not require familiarity with SQL commands. It is a tool to be used by both developers and...
    Downloads: 89 This Week
    Last Update:
    See Project
  • 8
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    Grab is a python framework for building web scrapers. With Grab you can build web scrapers of various complexity, from simple 5-line scripts to complex asynchronous website crawlers processing millions of web pages. Grab provides an API for performing network requests and for handling the received content e.g. interacting with DOM tree of the HTML document. The single request/response API that allows you to build network request, perform it and work with the received content. The API is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    GitHub Actions for Firebase

    GitHub Actions for Firebase

    GitHub Action for interacting with Firebase

    This Action for firebase-tools enables arbitrary actions with the firebase command-line client. Starting with version v2.1.2 each version release will point to a versioned docker image allowing for hardening our pipeline (so things don't break when I do something dump). On top of this, you can also point to a master version if you would like to test out what might not be deployed into a release yet. If you want to add a message to a deployment (e.g. the Git commit message) you need to take...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    req

    req

    Simple Go HTTP client with Black Magic

    Simple and easy to use, providing rich client-level and request-level settings, all of which are intuitive and chainable methods. Provides powerful and convenient debug utilities, including debug logs, performance traces, and even dump the complete request and response content. API testing can be done with minimal code, no need to explicitly create any Request or Client, or even to handle errors. Detect and decode to utf-8 automatically if possible to avoid garbled characters (See Auto...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    python-fxxk-spider

    python-fxxk-spider

    Collection of 100+ Python web scraping projects and crawler examples

    python-fxxk-spider is a curated collection of Python web scraping and crawler projects gathered in a single repository for reference and learning. It aggregates many independent scraping examples that target a wide range of websites, online services, and public data sources. Instead of being a single crawler tool, it functions as a catalog of ready-made Python spider implementations that demonstrate different scraping techniques. python-fxxk-spider includes scrapers for social media,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender :...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    AutoWikiBrowser is a semi-automated Wikipedia editor, designed to make tedious, repetitive tasks quicker and easier. For more information, see the project homepage at http://en.wikipedia.org/wiki/Wikipedia:AutoWikiBrowser.
    Leader badge
    Downloads: 79 This Week
    Last Update:
    See Project
  • 16

    ahCrawler

    A PHP search engine for your website and web analytics tool. GNU GPL3

    ahCrawler is a set to implement your own search on your website and an analyzer for your web content. It can be used on a shared hosting. It consists of * crawler (spider) and indexer * search for your website(s) * search statistics * website analyzer (http header, short titles and keywords, linkchecker, ...) You need to install it on your own server. So all crawled data stay in your environment. You never know when an external webspider updated your content. Trigger a rescan...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    溫度日記 Hearty Journal

    溫度日記 Hearty Journal

    療癒系心情日記 App

    Hearty Journal is a beautiful diary and personal journal application with a focus on privacy. Securely record your thoughts, feelings, ideas and private moments with the ease of writing on a pad of paper. Its aesthetic looks like a piece of notebook paper with handwritten words on it. Also, beautiful themes, lovely journal stickers and luxury fonts are available in the app. Hearty Journal works on both your computer and phone (Windows, macOS, iOS and Android are supported). Moreover, to keep...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Spider-Search

    Spider-Search

    Search multiple engines for a specific string

    Search multiple engines for a specific string
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Easyspider - Distributed Web Crawler

    Easyspider - Distributed Web Crawler

    Easy Spider is a distributed Perl Web Crawler Project from 2006

    Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    crawly

    crawly

    High-level web crawling and scraping framework for Elixir apps

    Crawly is a high-level application framework for crawling websites and extracting structured data using the Elixir programming language. It provides a complete environment for building web crawlers that systematically visit pages, collect information, and transform that data into structured formats for further processing. Crawly is designed for tasks such as data mining, information processing, and building historical archives of web content. Crawly follows the Elixir and OTP architecture...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Orao Basket

    Orao Basket

    Programming tools for emulator of eight bit computer ORAO

    Smederevo, 05, august 2018 Long time ago, about 1986 I have become proud owner of eight bit computer ORAO based on MOS 6502 processor. It was first and for me the best home computer at that time. My whole knowledge of computer programming begins with that computer. Recently for some unknown reason I have become interested in old eight bit computers again. After short search on the Internet I have found emulator of my favorite computer. It literally emulates every peace of hardware...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    sposkpat2

    sposkpat2

    sposkpat2, Single Purpose Operating System Kpat Live Distro

    ...Please give it a try. 12 card games are included: Aces Up Forty & Eight Freecell Golf Grandfather Grandfather's Clock Gypsy Klondike Mod3 Simple Simon Spider Yuko A safe and silent way to play a card game: Blocked from all networks, including the internet. Discs are spinned down for quietness and energy-saving. No distractions, no nags, never. Open source. Now for displays up to 4k. Made possible by debian (made on buster for bullseye) and KDE's kpat. Boots from CD/DVD, USB stick and inside virtual machines such as qemu. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    rubywebcrawler

    web spider software written in ruby

    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB