Showing 6169 open source projects for "web site scraper"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    shot-scraper

    shot-scraper

    A command-line utility for taking automated screenshots of websites

    shot-scraper is a command-line utility for taking automated screenshots of web pages using a headless browser engine. After installation, a single command can capture a full-page screenshot of a URL and save it to a file, making it ideal for documentation, monitoring, and visual regression tasks. Under the hood it uses a modern browser (installed via a one-time shot-scraper install step) and exposes options for viewport size, full-page versus clipped screenshots, and device emulation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    TEAMMATES Developer Web Site

    TEAMMATES Developer Web Site

    This is the project website for the TEAMMATES feedback management tool

    TEAMMATES is a free online tool for managing peer evaluations and other feedback paths of your students. It is provided as a cloud-based service for educators/students and is currently used by hundreds of universities across the world.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Web-Check

    Web-Check

    All-in-one OSINT tool for analysing any website

    Comprehensive, on-demand open source intelligence for any website. Get an insight into the inner-workings of a given website: uncover potential attack vectors, analyse server architecture, view security configurations, and learn what technologies a site is using. Currently the dashboard will show: IP info, SSL chain, DNS records, cookies, headers, domain info, search crawl rules, page map, server location, redirect ledger, open ports, traceroute, DNS security extensions, site performance,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Markdown Site

    Markdown Site

    An open-source publishing framework built for AI agents and developers

    Markdown Site is an open-source publishing framework built to help developers and AI agents quickly ship content-driven websites, blogs, or documentation directly from Markdown files with a seamless sync workflow. It is built on modern web technologies such as React, Convex, and Vite, and integrates real-time syncing so that changes to Markdown content locally instantly propagate to live views without the need to rebuild or redeploy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Site Kit for WordPress

    Site Kit for WordPress

    Site Kit is a one-stop solution for WordPress users

    Site Kit is a first-party WordPress plugin that brings key Google services into a single dashboard so site owners can see how their content performs and fix issues without leaving wp-admin. After a guided setup and verification flow, it connects properties to Search Console, Analytics, AdSense, PageSpeed Insights, and other services, surfacing the most relevant metrics per page and per site. The plugin focuses on clarity: traffic sources, search queries, top pages, and monetization signals...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    CyberScraper 2077

    CyberScraper 2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    CommunityScrapers

    CommunityScrapers

    This is a public repository containing scrapers

    Stash Community Scrapers is a large open-source collection of metadata extraction tools designed to work with the Stash media management platform, enabling automated scraping of content information from various online sources. The repository contains hundreds of scraper definitions written primarily in YAML and Python, each tailored to extract structured metadata such as titles, performers, tags, and media details from specific websites. These scrapers integrate directly into Stash, allowing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Scraper of Death
    Scraper of Death is a web scraper. Multiple Scraping Methods Requests + BeautifulSoup (fast, lightweight) Selenium (JavaScript support, dynamic content)
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MeshCentral

    MeshCentral

    A complete web-based remote monitoring and management web site

    The open source, multi-platform, self-hosted, feature-packed web site for remote device management. MeshCentral is a full computer management web site. With MeshCentral, you can run your own web server to remotely manage and control computers on a local network or anywhere on the internet. Once you get the server started, create device group and download and install an agent on each computer you want to manage.
    Downloads: 129 This Week
    Last Update:
    See Project
  • 12
    html-metadata

    html-metadata

    MetaData html scraper and parser for Node.js (supports Promises

    The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Free ChatGPT Site List

    Free ChatGPT Site List

    It collects and organizes a wide variety of ChatGPT resources

    Free ChatGPT Site List is an open-source aggregation project that collects and organizes a wide variety of ChatGPT and AI web resources into a single navigable directory. The repository functions primarily as a curated navigation hub where users can discover free AI tools, websites, and services in one place. It was designed to reduce friction for users trying to locate working AI endpoints or utilities across the rapidly changing ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Fluent UI Web

    Fluent UI Web

    Collection of utilities andcomponents for building web applications

    A collection of UX frameworks for creating beautiful, cross-platform apps that share code, design, and interaction behavior. Build for one platform or for all. Everything you need is here. Build your own apps using the same open source components we do, with accessibility, internationalization, and performance included. From tutorials to a fun collection of API references, find what you need to design and develop your own Fluent experience. From Word and Excel to PowerBI and Teams, many...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Ulixee Hero

    Ulixee Hero

    The web browser built for scraping

    It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Heimdall

    Heimdall

    An Application dashboard and launcher

    As the name suggests Heimdall Application Dashboard is a dashboard for all your web applications. It doesn't need to be limited to applications though, you can add links to anything you like. Heimdall is an elegant solution to organize all your web applications. It’s dedicated to this purpose so you won’t lose your links in a sea of bookmarks. Why not use it as your browser start page? It even has the ability to include a search bar using either Google, Bing or DuckDuckGo. ...
    Downloads: 48 This Week
    Last Update:
    See Project
  • 21
    blogdown

    blogdown

    Create Blogs and Websites with R Markdown

    blogdown is an R package that enables the creation and maintenance of static websites and blogs using R Markdown and Hugo (or other static-site generators). Developed by Yihui Xie and team, it provides functions to initialize sites, write posts, manage themes, and deploy with minimal fuss. It seamlessly blends R code chunks and web content, ideal for data storytellers and technical bloggers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Material e621

    Material e621

    Material e621 is a modern, open source web client for e621.net

    Material e621 is an open-source web client designed as a modern alternative interface for browsing content on the e621 platform, offering improved usability, customization, and performance compared to the original site. It is built with modern frontend technologies such as Vue and TypeScript and follows a Material Design-inspired aesthetic to provide a cleaner and more intuitive user experience.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 23
    eleventy

    eleventy

    A simpler site generator. Transforms a directory of templates

    A static site generator for modern web development, focusing on flexibility and customization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    WordPress

    WordPress

    Just a mirror of the WordPress subversion repository

    WordPress is one of the world’s most widely used content management systems (CMS), powering blogs, websites, and increasingly web apps. It offers a flexible architecture of themes and plugins, where users can extend functionality or customize layout without touching core code. The administrative dashboard includes post and page editors, media library, user roles, plugin/theme installation, and site settings. Through its REST API and headless mode, WordPress also serves as a backend for decoupled front ends using frameworks like React, Vue, or Gatsby. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 25
    Netlify CMS

    Netlify CMS

    A Git-based CMS for static site generators

    Open source content management for your Git workflow. Use Netlify CMS with any static site generator for a faster and more flexible web project. Get the speed, security, and scalability of a static site, while still providing a convenient editing interface for content. Content is stored in your Git repository alongside your code for easier versioning, multi-channel publishing, and the option to handle content updates directly in Git.
    Downloads: 10 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB