Search Results for "extract website content"

Showing 955 open source projects for "extract website content"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Website Stalker

    Website Stalker

    Track changes on websites via git

    This tool checks all the websites listed in its config. When a change is detected, the new site is added to a git commit. It can then be inspected via normal git tooling. The config describes a list of sites. Each site has a URL. Additionally, each site can have editors which are used before saving the file. Each editor manipulates the content of the URL.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    PdfPig

    PdfPig

    Read and extract text and other content from PDFs in C#

    This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    Java Design Patterns Website

    Java Design Patterns Website

    Next generation website for Java Design Patterns

    This project is the VuePress-powered web front end for the well-known “java-design-patterns” project, which documents classic and modern design patterns in Java. Its purpose is to present that large body of content as a browsable, fast, static website with organized navigation, search, and pattern categorization. Instead of reading patterns only in GitHub markdown, users can consume them in a more pleasant documentation format with sections, sidebars, and themed pages. The site structure makes it easier to discover related patterns, see intent and applicability, and jump between creational, structural, and behavioral groups. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Epublifier

    Epublifier

    Converts some webnovels to epub format

    A tool to convert website-based books or lists of pages to ePub format to read on your eReader/Kindle/etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Microweber

    Microweber

    Drag and Drop Website Builder and CMS with E-commerce

    ...Its revolutionary Real-Time Text Writing & Editing feature alongside its Drag and Drop feature means the user experience is significantly improved, and users are able to achieve a visually appealing website and easier content management with a lot less time and effort.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 9
    TikTok MCP

    TikTok MCP

    Model Context Protocol (MCP) with TikTok integration

    The TikTok MCP integrates TikTok access into AI applications like Claude AI via TikNeuron. It enables analysis and interaction with TikTok content to determine virality factors and extract video content. ​
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Skyvern

    Skyvern

    Automate browser-based workflows with LLMs and Computer Vision

    Skyvern uses a combination of computer vision and AI to understand content on a webpage, making it adaptable to any website. Skyvern takes instructions in natural language, allowing it to execute complex objectives with simple commands. Skyvern is an API-first product. Workflows execute in the cloud, allowing it to run hundreds of workflows at the same time. Skyvern's AI decisions come with built-in explanations, providing clear summaries and justifications for every action. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 11
    Reader LLM

    Reader LLM

    Convert any URL to an LLM-friendly input with a simple prefix

    ...In addition to converting individual pages, the service can perform web searches and return relevant content that can be ingested directly by AI systems. The tool relies on specialized models and parsing techniques to handle complex HTML structures and extract meaningful content while preserving important context.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12

    ldif-extract

    Extrect selected entries from LDIF files like grep

    ldif-extract is a small 'grep' like tool to extract and convert data from LDIF files. It could be used standalone or also in a pipe together with other tools like ldapsearch.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    LLM Scraper

    LLM Scraper

    Extract structured data from webpages using LLM-powered scraping

    LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Dendrite

    Dendrite

    Tools to build web AI agents that can authenticate

    Dendrite Python SDK is a toolkit for building web AI agents that can authenticate, interact with, and extract data from any website, facilitating web automation tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    ...It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Astro

    Astro

    The web framework for content-driven websites

    Astro powers the world's fastest marketing sites, blogs, e-commerce websites, and more. Astro improves website performance by rendering components on the server, sending lightweight HTML to the browser with zero unnecessary JavaScript overhead. Astro was designed to work with your content, no matter where it lives. Load data from your file system, external API, or your favorite CMS. Extend Astro with your favorite tools. Bring your own JavaScript UI components, CSS libraries, themes, integrations, and more. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Chandra

    Chandra

    OCR model for complex documents with layout-aware structured outputs

    Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 19
    Laravel Sharp

    Laravel Sharp

    Laravel 10+ Content management framework

    Sharp is a content management framework, a toolset that provides help to build a CMS section in a website, with some rules in mind. The public website should not have any knowledge of the CMS, the CMS is a part of the system, not the center of it. In fact, removing the CMS should not have any effect on the project. Content administrators should work with their data and terminology, not CMS terms.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Hugo

    Hugo

    The world’s fastest framework for building websites

    Hugo is a popular, fast and flexible open source static site generator written in Go. It’s designed for speed and flexibility, while also being very easy to use. Hugo has the amazing ability to render a typical, moderately-sized website in just a fraction of a second. It takes Hugo around 1 millisecond to render each piece of content, making it the fastest tool of its kind. Hugo supports unlimited content types, and ships with pre-made templates to make SEO, analytics and many other functions quick and easy to achieve. It’s got a robust theming system, capable of producing even the most complex websites. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    Jekyll

    Jekyll

    A simple, blog-aware static site generator written in Ruby

    Jekyll is a simple, blog-aware, static site generator that’s ideal for creating personal, project, or organization sites. Jekyll is incredibly simple-- it just takes your content, renders Markdown and Liquid templates, and spits out a complete, static website ready for deployment. No configurations, databases, pesky updates and other needless complexities. Jekyll lets you focus on what really matters: your content. Jekyll is easy to install and run. You can have your own website or blog up and running in no time at all!
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22

    sm-website

    Content management system. Goal is to make it possible to rapidly finn

    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Planet

    Planet

    Build and host decentralized blogs and websites on your Mac

    ...Did you know that you can use an Ethereum Name (ENS) to set up a website? It's true! You can use the Content Hash field, just like you would use an A or CNAME record for a traditional domain name. The standard for this is EIP-1577, and the Content Hash field can accept a few different values.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Link-Preview-JS

    Link-Preview-JS

    Extract web links information: title, description, images, videos, etc

    link-preview-js is a lightweight TypeScript library that extracts metadata from URLs or HTML content to generate rich link previews. By parsing Open Graph tags and other metadata, it retrieves information such as titles, descriptions, images, and videos. Designed primarily for Node.js and mobile environments, it facilitates the creation of link previews similar to those found on social media platforms.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    S-cart

    S-cart

    Free Laravel ecommerce for business: shopping cart, cms content

    S-Cart is the best free e-commerce website project for individuals and businesses, built on top of Laravel Framework and the latest technologies. Our goal is "Efficient and friendly for everyone". S-Cart is the free source e-commerce website, multifunction, built on Laravel Framework.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB