Showing 193 open source projects for "web indexing"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    Spotweb

    Spotweb

    Decentralized community

    Spotweb is an open-source PHP-based web interface for the Usenet indexing service Spotnet, allowing users to search, browse, and download NZBs from Usenet content feeds. It provides a full-featured, self-hosted Usenet indexing system that supports user accounts, moderation, comments, and custom filtering. Spotweb makes it easy for users to run their own NZB indexing server and integrates with download clients like SABnzbd or NZBGet.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    diskover-community

    diskover-community

    Open source file indexing & storage analytics powered by Elasticsearch

    Diskover Community Edition is an open source file system indexing and storage analytics platform designed to help organizations understand and manage large volumes of file data. It crawls file systems and indexes metadata using Elasticsearch, enabling fast search, analysis, and organization of files stored across different storage systems. It allows administrators and users to explore file structures, monitor storage usage, and gain insights into how data is distributed across...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    fess

    fess

    Open source enterprise search server for websites, files, and data

    ...It also provides a web-based administrative interface that allows administrators to configure crawling targets, manage indexing tasks, and adjust search settings from a graphical dashboard.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Search-Index

    Search-Index

    A persistent, network resilient, full text search library

    Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Anna’s Archive

    Anna’s Archive

    Comprehensive search engine for books, papers, comics, magazines

    Anna’s Archive is a large-scale open-source search engine and data aggregation platform designed to index and provide access to a vast collection of books, academic papers, comics, magazines, and other digital texts through a unified interface. The project includes all the infrastructure required to run a full instance locally or in production, combining web servers, databases, and search indexing systems into a scalable architecture. It relies heavily on technologies such as Elasticsearch for search functionality and MariaDB for structured data storage, enabling fast and efficient querying across massive datasets. The system is designed with redundancy and replication in mind, allowing distributed deployments and mirrored environments to handle high traffic and large data volumes. ...
    Downloads: 35 This Week
    Last Update:
    See Project
  • 6
    Memvid

    Memvid

    Video-based AI memory library. Store millions of text chunks in MP4

    Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.
    Downloads: 256 This Week
    Last Update:
    See Project
  • 7
    Text Search Engine

    Text Search Engine

    A text search engine that supports mixed Chinese and English search

    Text-Search-Engine is a JavaScript-based lightweight search engine that enables full-text search functionality. It allows developers to implement fast search indexing and retrieval in web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ROMM

    ROMM

    A beautiful, powerful, self-hosted rom manager and player

    ...It reimagines the home screen with adaptive layouts, predictive app recommendations, and dynamic organization so that frequently used tools are always within reach. The launcher includes a powerful universal search that combs through installed apps, contacts, messages, and web results to deliver quick answers without switching contexts. Romm also supports widgets, customization options, and theme choices so users can tailor the visual experience to their preferences while maintaining performance and responsiveness. Privacy is a highlight, with local indexing and search functions that operate without sending data to external servers unless explicitly permitted.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Bazarr

    Bazarr

    Bazarr is a companion application to Sonarr and Radarr

    ...Once you configure the languages and quality rules for your media library, Bazarr continuously monitors new episodes or releases indexed by those applications and automatically finds and fetches matching subtitle files, including external upgrades when better options appear. It offers both automatic and manual search capabilities through a modern web interface, letting you fine-tune what gets downloaded and stored alongside your video files. Bazarr supports a wide array of subtitle providers around the world and can track download history, manage multiple languages, and perform post-download cleanup if needed. Because it doesn’t itself scan your disk libraries but instead relies on Sonarr or Radarr indexing, it fits cleanly into automated media stacks on NAS, servers.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 10
    RavenDB

    RavenDB

    ACID Document Database

    A NoSQL document database designed for high-performance, real-time applications with built-in distributed capabilities.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Midarr Server

    Midarr Server

    Midarr, the minimal lightweight media server

    Midarr is a minimal, lightweight media server built to complement tools like Radarr or Sonarr. Instead of reinventing the media management stack, it leverages existing setups and metadata providers to serve media files "fresh off the metal" without re-indexing or transcoding by default. It offers a sleek web interface with authentication, user profiles, real-time statuses, and experimental support for remuxing/transcoding and Chromecast compatibility.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    mgrep

    mgrep

    A calm, CLI-native way to semantically grep everything, like code

    ...Built with a focus on calm CLI experiences, it lets you index and query your local files with semantic understanding, delivering results that are relevant to your intent rather than simple pattern matches, which is especially powerful in large or diverse projects. It also includes features such as background indexing to keep your search index up to date without interrupting your workflow and web search integration to expand the scope of queries beyond local files. Designed for both programmers and agents, it integrates naturally into development and research workflows while offering thoughtful defaults that keep output clean and informative.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    BestBlogs

    BestBlogs

    A collection of top programming

    BestBlogs is an open-source project designed to aggregate, organize, and surface high-quality blog content from across the web, helping users discover valuable articles in a structured and accessible way. The platform focuses on curating content based on relevance, quality, and usefulness rather than simply indexing large volumes of information, making it particularly useful for developers, researchers, and knowledge seekers. It typically integrates automated data collection and filtering mechanisms to gather blog posts from multiple sources, then categorizes and ranks them to improve discoverability. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Just the Class

    Just the Class

    A modern, highly customizable, responsive Jekyll template

    A modern, highly customizable, responsive Jekyll template for course websites. Just the Class is a GitHub Pages template developed for the purpose of quickly deploying course websites. In addition to serving plain web pages and files, it provides a boilerplate for announcements, course calendar, etc. Just the Class is a template that extends the popular Just the Docs theme, which provides a robust and thoroughly-tested foundation for your website.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Dozzle

    Dozzle

    Realtime log viewer for containers. Supports Docker, Swarm and K8s

    Dozzle is a lightweight, self-hosted web application for real-time viewing and monitoring of container logs, focused on speed and simplicity rather than building a full log storage pipeline. Instead of indexing or storing logs, it connects to your container runtime and streams live output so you can diagnose issues as they happen. The interface includes practical quality-of-life features like fuzzy searching for containers, regex log search, split-screen viewing for multiple logs, and live stats such as CPU and memory usage. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 16
    LangChain-ChatGLM-Webui

    LangChain-ChatGLM-Webui

    Automatic question answering for local knowledge bases based on LLM

    LangChain-ChatGLM-Webui is an open-source web interface that integrates the ChatGLM large language model with the LangChain framework to create an interactive conversational AI platform. The project provides a graphical interface that allows users to interact with language models through chat sessions while also connecting those models to external knowledge sources. It supports retrieval-augmented generation workflows that enable the system to answer questions based on local documents or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    WeKnora

    WeKnora

    LLM framework for document understanding and semantic retrieval

    WeKnora is an open source framework developed for deep document understanding and semantic information retrieval using large language models. It focuses on analyzing complex and heterogeneous documents by combining multiple processing stages such as multimodal document parsing, vector indexing, and intelligent retrieval. It follows the Retrieval-Augmented Generation (RAG) paradigm, where relevant document segments are retrieved and used by language models to generate accurate, context-aware...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Brokk

    Brokk

    Brokk brings code intelligence to AI

    Brokk is a code intelligence assistant framework designed to let large language models (LLMs) understand code semantically (not just as raw text) so that they can work effectively on large codebases that don’t fit wholly in a prompt context. It helps bridge the gap between LLMs and real-world engineering code by offering tooling to index, analyze, query, and augment code context, so that AI can meaningfully reason about existing code, suggest edits, and navigate across projects. Modular...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Scribe.js

    Scribe.js

    JavaScript OCR and text extraction for images and PDFs

    ...The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. In addition to simple text extraction, Scribe.js supports writing or injecting a high-quality invisible text layer back into PDFs, effectively making them searchable and improving usability for indexing or accessibility. It is written in modern ECMAScript Modules (ESM), so it can be imported in both browser and Node.js environments without a build step, though browser usage requires same-origin hosting of the files.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Forge Code

    Forge Code

    AI enabled pair programmer for Claude, GPT, O Series, Grok, Deepseek

    Forge is a modern, open-source tool that brings AI-powered code assistance directly into your terminal workflow, effectively turning your shell into a “pair programmer”, without ever leaving your development environment. Written in Rust (with a command-line interface), Forge integrates with your existing shell (bash, zsh, fish, etc.) or IDE-agnostic workflows, allowing you to interact with your codebase, command-line tools, and version control as usual, but with the added support of large...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    GPT Crawler

    GPT Crawler

    Crawl a site to generate knowledge files to create your own custom GPT

    GPT Crawler is an open-source tool designed to automatically crawl websites and generate structured knowledge that can be used to build AI assistants and retrieval systems. It focuses on extracting high-quality textual content from web pages and preparing it in formats suitable for embedding, indexing, or fine-tuning workflows. The project is especially useful for teams that want to turn documentation sites or knowledge bases into conversational AI backends without building custom scrapers from scratch. It includes configurable crawling logic, content filtering, and output pipelines that streamline the process of preparing data for large language models. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    OpenArchiver

    OpenArchiver

    An open-source platform for legally compliant email archiving

    ...It’s designed for scenarios where reliable, tamper-proof archiving and full-text search across both emails and attachments are essential for legal discovery, compliance, or long-term records retention. The platform combines a modern web UI with powerful backend services, including fast indexing, deduplication, encryption at rest, and asynchronous ingestion workflows, making it suitable for both small teams and enterprise deployments. Beyond simply capturing email, it emphasizes security and auditability with features like secure storage formats, file integrity verification, and detailed audit trails of user interactions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    GooFuzz

    GooFuzz

    OSINT fuzzing tool using Google dorks to find exposed resources

    ...It is written in Bash and automates the use of Google Dorking queries to discover publicly accessible information related to a target domain. Instead of directly sending requests to the target server, GooFuzz gathers results through search engine indexing, allowing enumeration without leaving traces in the target’s server logs. This method enables the discovery of potentially sensitive files, directories, subdomains, and parameters that are already exposed on the web. By combining wordlists, search operators, and file extension filters, the tool helps security professionals locate misconfigured or unintentionally exposed resources. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ink-kit

    ink-kit

    Onchain-focused SDK with ready-to-use templates and themes

    ink-kit is a developer toolkit for building applications on the INK blockchain ecosystem, bundling the pieces you typically need to go from a blank repo to a working dapp. It provides contract templates, deployment scripts, and client SDKs so you can iterate on on-chain logic and a frontend without stitching together disparate tools. The kit standardizes project layout and environment configuration, making local development, testing, and staging deploys predictable. Utilities for wallet...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    RBush

    RBush

    High-performance JavaScript R-tree-based 2D spatial index

    RBush is a high-performance JavaScript library for 2D spatial indexing of points and rectangles. It's based on an optimized R-tree data structure with bulk insertion support. Spatial index is a special data structure for points and rectangles that allows you to perform queries like "all items within this bounding box" very efficiently (e.g. hundreds of times faster than looping over all items). It's most commonly used in maps and data visualizations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB