Showing 52 open source projects for "web scraper"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhereβ€”across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do bestβ€”building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Net KazΔ± ve KazΔ± GΓΆrΓΌnΓΌmΓΌ
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Ulixee Hero

    Ulixee Hero

    The web browser built for scraping

    It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise your...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    CyberScraper 2077

    CyberScraper 2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler could...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 5
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    scraper-with-chatgpt
    It is a powerful data scraping tool that helps you extract information from various online sources. Easily collect data from Google SERP, Maps, Shopify, Zillow, and more. With a user-friendly interface, you can scrape and save data in JSON or Excel formats. Unlock insights from the web effortlessly with scrape-it.cloud API.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    html-metadata

    html-metadata

    MetaData html scraper and parser for Node.js (supports Promises

    The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    ... a job website you'd like to write a scraper for, you are welcome to implement it, Review the Base Scraper for implementation details. JobFunnel supports scraping jobs from the same job website across locales & domains. If you are interested in adding support, you may only need to define session headers and domain strings, Review the Base Scraper for further implementation details.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    YouTube video web scraper 2 [ISA]

    YouTube video web scraper 2 [ISA]

    YouTube video web scraper 2 [Improved.Simplified.Alternative]

    'YouTube video web scraper 2' is an desktop application developed using python 3.11.4 and other add-on libaries. Finds YouTube video based on user request and view as table. Export the table as excel. Compatible only for windows OS.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    ai-scrapper
    πŸš€ Discover AI Web Scraper! πŸš€ Tired of copying and pasting data from websites? I developed a desktop application with Electron and Gemini AI to extract structured data easily and efficiently! πŸ€–βœ¨
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    ConsoleWebScraper

    ConsoleWebScraper

    It allows you to input a URL and it will scrape the HTML content...

    ..., the results will be saved on your desktop in a folder named "WebScrapperProject". Note This is a basic web scraper and may not work with all websites, especially those that heavily rely on JavaScript for rendering content or have measures in place to prevent scraping. Author Bohdan Harabadzhyu License This project is licensed under the terms of the GNU General Public License v3.0 (GPL-3.0) - see the LICENSE file for details.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    FungiRegEx

    FungiRegEx

    FungiRegEx

    This tool is a web-based search engine for regular expressions in the proteomes, all the information is obtained from the JGI (Joint Genome Institute) database through a scraper for all the available species; therefore this tool only considers fungi organisms. In this version, we use React JS in front-end and NodeJS + Express for back-end. Full Documentation Available on: https://victormiguelterronmacias.slite.page/p/J7BJU3hXhd72EJ/FungiRegEx-Software-documentation If you want...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    SecretAgent

    SecretAgent

    The web scraper that's nearly impossible to block

    SecretAgent is a headless browser that’s nearly impossible to detect. It achieves this by emulating real users. And it has powerful auto-replay functionality that lets you create and debug scripts in record setting time.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    soup

    soup

    Web Scraper in Go, similar to BeautifulSoup

    Web Scraper in Go, similar to BeautifulSoup. soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup. Pointer containing the pointer to the current html node. NodeValue containing the current html node's value, i.e. the tag name for an ElementNode, or the text in case of a TextNode. Error containing an error in a struct if one occurs, else nil is returned. A detailed text explanation of the error can be accessed using the Error() function. A field...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Till

    Till

    DataHen Till is a companion tool to your existing web scraper

    DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes. Web scraping is usually easy to get started, especially on a small scale. However, as you try to scale it up, it gets exponentially difficult. Scraping 10,000 records can easily be done with simple web scraper scripts in any programming language, but as you try to scrape millions...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    yabasta

    yabasta

    Yet Another BAsic Scraper and Text Analysis

    YA BASTA! is a Python/R application for Lyrics Web Scraper and Text Analysis. Web scraping is developed in Python, text analysis in R as Python subprocesses. YA BASTA! is only tested on windows OS. To run YA BASTA! just type on window command prompt: python.exe yabasta.py
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    django-dynamic-scraper

    django-dynamic-scraper

    Creating Scrapy scrapers via the Django admin interface

    Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface. With Django Dynamic Scraper (DDS) you can define your Scrapy scrapers dynamically via the Django admin interface and save your scraped items in the database you defined for your Django project. Since it simplifies things DDS is not usable for all kinds of scrapers...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    google-play-scraper

    google-play-scraper

    Node.js scraper to get data from Google Play

    ... to the one specified. Returns the list of permissions an app has access to. Retrieve a full list of categories present from the dropdown menu on Google Play. Since every library call performs one or multiple requests to an Google Play API or web page, sometimes it can be useful to cache the results to avoid requesting the same data twice.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24

    WebExtractServer

    WebExtractServer use with WebExtractLte for use with web browsers

    Browse data, fetched by WebExtractLte directly in your browser. Designed to be used with Webscraper (webscraper.io) - third party web scraper tool, available as plugin for Chrome and Firefox.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    JAWS - Just Another Web Scraper

    JAWS - Just Another Web Scraper

    A simple Web Scraper using Regular Expression or Html Agility

    JAWS or Just Another Web Scraper, is part of the Data Scraping Softwares developed by SVbook, alongside JATI (Image to Text) and JAVT (Video to Text). JAWS offer easy interface to scrape data from the website using regular expression, text preprocessing, or HTML Agility Pack.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.