Showing 47 open source projects for "data scraper"

View related business solutions
  • Catch Bugs Before Your Customers Do Icon
    Catch Bugs Before Your Customers Do

    Real-time error alerts, performance insights, and anomaly detection across your full stack. Free 30-day trial.

    Move from alert to fix before users notice. AppSignal monitors errors, performance bottlenecks, host health, and uptime—all from one dashboard. Instant notifications on deployments, anomaly triggers for memory spikes or error surges, and seamless log management. Works out of the box with Rails, Django, Express, Phoenix, Next.js, and dozens more. Starts at $23/month with no hidden fees.
    Try AppSignal Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Linkedin Scraper

    Linkedin Scraper

    A library that scrapes Linkedin for user data

    Linkedin Scraper is a library that scrapes Linkedin for user data. Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper. The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Colly

    Colly

    Elegant Scraper and Crawler Framework for Golang

    Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    CyberScraper 2077

    CyberScraper 2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Email Scraper and Validator
    This is a simple desktop application built with Python and Tkinter that allows users to scrape email addresses from websites and validate them using an external API. It also provides features to save the scraped emails to a database, and export the data to various file formats. 1. Enter a list of website URLs or emails in the input field. 2. Click the Scrape button to scrape email addresses from the provided websites. 3. Click the Validate button to validate the scraped email...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    Google Maps Extractor

    Google Maps Extractor

    Free Google Map Extractor(With Email) | Google Maps Scraper

    A free Google Map extractor for business leads—fast & efficient! This Google Maps scraper extracts phone numbers, emails, locations, and social media profiles, then exports to CSV. Visit: https://gmplus.io/
    Downloads: 7 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    ai-scrapper
    🚀 Discover AI Web Scraper! 🚀 Tired of copying and pasting data from websites? I developed a desktop application with Electron and Gemini AI to extract structured data easily and efficiently! 🤖✨
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    linkedin2username

    linkedin2username

    Generate probable usernames from LinkedIn company employee lists

    ...This process helps security researchers, penetration testers, and investigators perform reconnaissance by building potential username lists for further security testing or OSINT analysis. Unlike tools that rely on official APIs, linkedin2username operates as a pure web scraper and therefore does not require API keys. The script uses Selenium to automate browser interactions and perform searches within LinkedIn to gather employee data.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    scraper-with-chatgpt
    It is a powerful data scraping tool that helps you extract information from various online sources. Easily collect data from Google SERP, Maps, Shopify, Zillow, and more. With a user-friendly interface, you can scrape and save data in JSON or Excel formats. Unlock insights from the web effortlessly with scrape-it.cloud API.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    URS (Universal Reddit Scraper)

    URS (Universal Reddit Scraper)

    A comprehensive Reddit scraping command-line tool written in Python

    Universal Reddit Scraper, a comprehensive Reddit scraping command-line tool written in Python. Whether you are using URS for enterprise or personal use, I am very interested in hearing about your use case and how it has helped you achieve a goal. This is a comprehensive Reddit scraping tool that integrates multiple features. All files except for those generated by the wordcloud tool are exported to JSON by default. Wordcloud files are exported to PNG by default. All exported files are saved...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    NSFW Data Scraper

    NSFW Data Scraper

    Collection of scripts to aggregate image data

    NSFW Data Scraper is an open-source project that provides scripts for automatically collecting large datasets of images intended for training NSFW image classification systems. The repository focuses on aggregating image data from various online sources so that developers can build datasets suitable for training content moderation models. These datasets typically contain images categorized into different classes associated with adult or explicit content, which can then be used to train neural networks that detect unsafe or inappropriate material. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    ...Once trained, the generated scraper can process new pages and return the extracted data in structured formats such as dictionaries or lists. This approach simplifies web scraping tasks by shifting the focus from rule-writing to example-based training. Internally, the project processes HTML documents, identifies relevant elements in the DOM, and builds extraction logic based on statistical or heuristic analysis of the training samples.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18

    Vanga

    Compiler-like generic data scraper and GUI automation tool.

    A Java-based visual compiler for GUI recognition and automation. The screens are described in an XML file which contains the definitions of lexemes and the tokens that comprise them. Upon a successful match of a screen, user-defined code is executed. Within the scope of this code, the user is capable of extracting data from the screen, interpreting it, and driving the GUI accordingly. The demonstration example reads the value of a calculator, displays it for the user, and enables him to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    NYT Vote Scraper

    NYT Vote Scraper

    Scrapes the NYT Votes Remaining Page JSON

    NYT Vote Scraper is a small but clever project that periodically fetches JSON data from the “Votes Remaining” page of The New York Times during the 2020 U.S. presidential election and commits the results into the repository, effectively using Git as a time-series database. The idea is to create a historical record — including diffs — of how vote counts and “votes remaining” estimates changed over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    JonDoFox Advanced Privacy Browser

    JonDoFox Advanced Privacy Browser

    Browser with fingerprinting- and psychological profiling protection

    In addition to fingerprinting, ad networks are collecting psychological data of the users. This data is primarily based on mouse movement and scroll (we can't block clicks. reasonably). It leaks and is being used for anything from spam to blackmailing. Our addons block only those javascript functions, thus leaving the Internet intact (unlike noscript, which makes FB being unusable). If a page is broken, hit ctrl+shift+p and retry (private browsing mode) We do reach a "nearly unique" rating at the EFF fingerprinting test page. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    django-dynamic-scraper

    django-dynamic-scraper

    Creating Scrapy scrapers via the Django admin interface

    ...Since it simplifies things DDS is not usable for all kinds of scrapers, but it is well suited for the relatively common case of regularly scraping a website with a list of updated items (e.g. news, events, etc.) and then dig into the detail page to scrape some more infos for each item. Django Dynamic Scraper tries to keep its data structure in the database as separated as possible from the models in your app, so it comes with its own Django model classes for defining scrapers, runtime information related to your scraper runs and classes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    google-play-scraper

    google-play-scraper

    Node.js scraper to get data from Google Play

    Node.js module to scrape application data from the Google Play store. Retrieves the full detail of an application. Retrieves a list of applications from one of the collections at Google Play. Retrieves a list of apps that results of searching by the given term. Returns the list of applications by the given developer name. Given a string returns up to five suggestions to complete a search query term. Retrieves a page of reviews for a specific application. Returns a list of similar apps to the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    WebExtractServer

    WebExtractServer use with WebExtractLte for use with web browsers

    Browse data, fetched by WebExtractLte directly in your browser. Designed to be used with Webscraper (webscraper.io) - third party web scraper tool, available as plugin for Chrome and Firefox.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    YellowPages Australia Scraper

    YellowPages Australia Scraper

    Gather Publicly Available Business Contact Data

    Do you need contact information of businesses in Australia? You can get it easily by using YellowPages Australia Scraper. Program automatically navigates in YellowPages AU and gathers the data you needed.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB