Showing 317 open source projects for "crawl site links"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    Laravel Sitemap

    Laravel Sitemap

    Create and generate sitemaps with ease

    This package can generate a sitemap without you having to add urls to it manually. This works by crawling your entire site. The generator has the ability to execute JavaScript on each page so links injected into the dom by JavaScript will be crawled as well. The easiest way is to crawl the given domain and generate a sitemap with all found links. The destination of the sitemap should be specified by $path. If you don't want a crawled link to appear in the sitemap, just don't return it in the callable you pass to hasCrawled. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    news-please

    news-please

    Python tool for crawling and extracting structured data from news site

    ...It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a site. It combines several established technologies and libraries to perform web crawling and content extraction, enabling reliable processing across a wide range of news sources. Developers can use the software either as a standalone command line application or integrate it into their own Python applications through its library interface. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    notfoundbot

    notfoundbot

    fix & archive outgoing links on your website

    notfoundbot is a GitHub Action that helps you automatically maintain the correctness of your website's outgoing links. It finds links that need fixing and opens pull requests that fix them. This action is intended for websites and blogs powered by static site generators. By using post dates derived from filenames, notfoundbot searches for Wayback Machine archives of linked resources that are contemporary to the post itself: broken links in a 2011 blog post will be linked to archives from around that era.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GPT Crawler

    GPT Crawler

    Crawl a site to generate knowledge files to create your own custom GPT

    GPT Crawler is an open-source tool designed to automatically crawl websites and generate structured knowledge that can be used to build AI assistants and retrieval systems. It focuses on extracting high-quality textual content from web pages and preparing it in formats suitable for embedding, indexing, or fine-tuning workflows. The project is especially useful for teams that want to turn documentation sites or knowledge bases into conversational AI backends without building custom scrapers...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Heimdall

    Heimdall

    An Application dashboard and launcher

    As the name suggests Heimdall Application Dashboard is a dashboard for all your web applications. It doesn't need to be limited to applications though, you can add links to anything you like. Heimdall is an elegant solution to organize all your web applications. It’s dedicated to this purpose so you won’t lose your links in a sea of bookmarks. Why not use it as your browser start page? It even has the ability to include a search bar using either Google, Bing or DuckDuckGo. You can use the app to link to any site or application, but Foundation apps will auto-fill in the icon for the app and supply a default color for the tile. ...
    Downloads: 65 This Week
    Last Update:
    See Project
  • 8
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. WaterCrawl supports customizable extraction rules so users can focus only on relevant elements while ignoring unnecessary page components. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    Nextra

    Nextra

    Simple, powerful and flexible site generation framework

    Simple, powerful, and flexible site generation framework with everything you love from Next.js. Nextra automatically converts Markdown links and images to use Next.js Link and Next.js Image when possible. No slow navigation or layout shift.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Sphinx

    Sphinx

    Main repository for the Sphinx documentation builder

    ...It was originally created for the Python documentation, and it has excellent facilities for the documentation of software projects in a range of languages. Of course, this site is also created from reStructuredText sources using Sphinx! HTML (including Windows HTML Help), LaTeX (for printable PDF versions), ePub, Texinfo, manual pages, plain text. Semantic markup and automatic links for functions, classes, citations, glossary terms and similar pieces of information. Easy definition of a document tree, with automatic links to siblings, parents and children. ...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 12
    1337x Proxy List

    1337x Proxy List

    1337x Proxy List for 2026

    1337x Proxy List is an open-source repository that curates a collection of proxy and mirror sites designed to provide access to the 1337x torrent platform when it is blocked or restricted in certain regions. The project aggregates and maintains a list of working proxy URLs that allow users to bypass ISP or government restrictions by routing traffic through alternative domains. These proxy sites replicate the original 1337x interface and functionality, enabling users to browse, search, and...
    Downloads: 235 This Week
    Last Update:
    See Project
  • 13
    Web-Check

    Web-Check

    All-in-one OSINT tool for analysing any website

    ...Get an insight into the inner-workings of a given website: uncover potential attack vectors, analyse server architecture, view security configurations, and learn what technologies a site is using. Currently the dashboard will show: IP info, SSL chain, DNS records, cookies, headers, domain info, search crawl rules, page map, server location, redirect ledger, open ports, traceroute, DNS security extensions, site performance, trackers, associated hostnames, carbon footprint. Stay tuned, as I'll add more soon. The aim is to help you easily understand, optimize and secure your website.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Django jazzmin

    Django jazzmin

    Jazzy theme for Django

    ...Select2 drop-downs. Bootstrap 4 & AdminLTE UI components. You can add links to the user menu on the top right of the screen using the "usermenu_links" settings key, the format of these links is the same as with top menu, though submenus via "app" are not currently supported and will not be rendered. The side menu gets a list of all installed apps and their models that have admin classes, and creates a tree of apps and links to model admin pages.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    LinkChecker

    LinkChecker

    Check links in web documents or full websites

    LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Whoogle Search

    Whoogle Search

    A self-hosted, ad-free, privacy-respecting metasearch engine

    Get Google search results, but without any ads, javascript, AMP links, cookies, or IP address tracking. Easily deployable in one click as a Docker app, and customizable with a single config file. Quick and simple to implement as a primary search engine replacement on both desktop and mobile. Autocomplete/search suggestions. POST request search and suggestion queries (when possible). View images at full res without site redirect (currently mobile only).
    Downloads: 11 This Week
    Last Update:
    See Project
  • 17
    fullPage.js

    fullPage.js

    Create beautiful fullscreen scrolling websites fast and easy

    fullPage.js is an easy-to-use library for creating beautiful, fullscreen scrolling websites/ onepage sites/ single page websites complete with all the features you need. With fullPage.js you can add landscape sliders and links to sections of your site, create smaller or bigger sections, use extensions and more! fullPage.js is compatible with all modern browsers and even some old ones like IE9 and Opera 12. It also provides touch support designed for mobile devices and touch screen computers. fullPage.js is fully supported by a great community, has good documentation and offers plenty of examples to get you started quickly. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    goclone

    goclone

    Fast CLI tool for cloning entire websites for local browsing offline

    goclone is a command-line utility designed to download and mirror complete websites to a local directory for offline access. It retrieves HTML pages, stylesheets, JavaScript files, images, and other assets from a target site and stores them on the user’s computer. It preserves the original site’s structure by maintaining relative links between pages, allowing the mirrored copy to function similarly to the live version when opened locally. Once a site has been cloned, users can browse the pages offline and navigate between them as if they were viewing the site online. goclone is written in Go and leverages concurrency through Go routines to perform downloads efficiently. goclone can also optionally start a local web server to serve the mirrored files for a more realistic browsing experience. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    PHPScraper

    PHPScraper

    A universal web-util for PHP

    PHPScraper is a universal web-scraping util for PHP, built with simplicity in mind. The goal is to make xPath Selectors optional and avoid the commonly needed boilerplate code. Just create an instance of PHPScraper, go to a website, and start collecting data. All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways. Many common use cases are covered already. You can find prepared extractors for various HTML...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    INS

    INS

    Inspiration database for Internet practitioners with no ads

    INS is described as a kind of “inspiration database for internet workers” — a repository that collects and curates interesting websites, tools, links, or resources that might inspire developers, designers, or any knowledge workers. It aims to operate without ads, focusing purely on the content and resource quality, and leverages automation (e.g. GitHub Actions) to check link validity or site load speed, ensuring that listed resources remain accessible over time. For people in tech who constantly seek new tools, articles, or creative inspiration, ins serves as a living catalogue that can be browsed, contributed to, and relied upon for discovering useful or thought-provoking material without the noise of ads or clickbait. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Infosec Reference

    Infosec Reference

    An Information Security Reference That Doesn't Suck

    Infosec Reference is a curated knowledge base and resource repository for information security practitioners. It aggregates cheat sheets, tooling guides, protocol deep dives, incident response playbooks, and threat actor profiles—all organized under accessible categories (network, web, host, cryptography, auditing). The repo is built as a living wiki of sorts: practitioners contribute updates, expand sections, or refine explanations as the threat landscape evolves. Because security spans...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    SiteOne Crawler

    SiteOne Crawler

    SiteOne Crawler is a website analyzer and exporter

    SiteOne Crawler is a very useful and easy-to-use tool you'll ♥ as a Dev/DevOps, website owner or consultant. Works on all popular platforms - Windows, macOS, and Linux (x64 and arm64 too). It will crawl your entire website in depth, analyze and report problems, show useful statistics and reports, generate an offline version of the website, generate sitemaps, or send reports via email. Watch a detailed video with a sample report for Astro. build website. This crawler can be used as a...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    SQL Explorer

    SQL Explorer

    Easily share data across your company via SQL queries

    SQL Explorer aims to make the flow of data between people fast, simple, and confusion-free. It is a Django-based application that you can add to an existing Django site, or use as a standalone business intelligence tool. Quickly write and share SQL queries in a simple, usable SQL editor, preview the results in the browser, share links, download CSV, JSON, or Excel files (and even expose queries as API endpoints, if desired), and keep the information flowing! Comes with support for multiple connections, to many different SQL database types, a schema explorer, query history (e.g. lightweight version control), a basic security model, in-browser pivot tables, and more. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    glsl-sandbox

    glsl-sandbox

    Shader editor and gallery

    GLSL Sandbox is an in-browser playground for writing and sharing fragment shaders with instant visual feedback. It provides a minimal editor and a fullscreen WebGL viewport so your shader takes center stage, making it perfect for learning, live-coding, and showcasing visual experiments. The environment injects a small set of uniforms—time, resolution, mouse—so you can animate and interact without boilerplate. A public gallery lets creators browse, fork, and remix shaders, turning the site...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    DSA Bootcamp Java

    DSA Bootcamp Java

    This repository consists of the code samples, assignments, and notes

    DSA Bootcamp Java is an open source educational repository created by Kunal Kushwaha to teach Data Structures and Algorithms (DSA) using Java. It is designed as a structured bootcamp, covering fundamental concepts to advanced problem-solving techniques. The project provides explanations, exercises, assignments, and practice problems, making it useful for both beginners and intermediate learners who want to strengthen their Java and algorithmic skills. The repository is organized into...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB