Showing 204 open source projects for "html parser"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 1
    html-react-parser

    html-react-parser

    HTML to React parser

    HTML to React parser that works on both the server (Node.js) and the client (browser). The parser converts an HTML string to one or more React elements. Available as part of the Tidelift Subscription. For TypeScript projects, you may need to check that domNode is an instance of domhandler's Element. Make sure to render parsed adjacent elements under a parent element.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    html-to-markdown

    html-to-markdown

    Convert HTML to Markdown. Even works with entire websites

    Convert HTML into Markdown with Go. It is using an HTML Parser to avoid the use of regexp as much as possible. That should prevent some weird cases and allows it to be used for cases where the input is totally unknown.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    html-loader

    html-loader

    HTML Loader

    ... and attributes. By default, the parser in html-loader interprets content inside noscript tags as #text, so processing of content inside this tag will be ignored. A very common scenario is exporting the HTML into their own .html file, to serve them directly instead of injecting with javascript.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    html-metadata

    html-metadata

    MetaData html scraper and parser for Node.js (supports Promises

    The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    LOL HTML

    LOL HTML

    Low output latency streaming HTML parser/rewriter with CSS API

    Low Output Latency streaming HTML rewriter/parser with CSS-selector based API. It is designed to modify HTML on the fly with minimal buffering. It can quickly handle very large documents, and operate in environments with limited memory resources. The crate serves as a back-end for the HTML rewriting functionality of Cloudflare Workers, but can be used as a standalone library with a convenient API for a wide variety of HTML rewriting/analysis tasks. The parser switches back to the tag scanner...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    fast-xml-parser

    fast-xml-parser

    Validate XML, Parse XML and Build XML rapidly

    Validate XML, Parse XML to JS Object, or Build XML from JS Object without C/C++ based libraries and no callback.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    PostHTML

    PostHTML

    PostHTML is a tool to transform HTML/XML with JS plugins

    PostHTML is a tool for transforming HTML/XML with JS plugins. PostHTML itself is very small. It includes only an HTML parser, an HTML node tree API and a node tree stringified. All HTML transformations are made by plugins. And these plugins are just small plain JS functions, which receive an HTML node tree, transform it, and return a modified tree. PostHTML is a tool for transforming HTML/XML with JS plugins. PostHTML itself is very small. It includes only an HTML parser, an HTML node tree API...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    htmlparser2

    htmlparser2

    The fast & forgiving HTML and XML parser

    The fast & forgiving HTML and XML parser. htmlparser2 is the fastest HTML parser, and takes some shortcuts to get there. If you need strict HTML spec compliance, have a look at parse5. htmlparser2 itself provides a callback interface that allows the consumption of documents with minimal allocations. While the Parser interface closely resembles Node.js streams, it’s not a 100% match. Use the WritableStream interface to process a streaming input.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    League CommonMark

    League CommonMark

    Highly-extensible PHP Markdown parser

    CommonMark is a PHP library that implements the CommonMark Markdown specification, allowing developers to convert Markdown into HTML. It provides a robust and extensible parser with support for additional syntax and extensions. The library is designed to be efficient and standards-compliant, making it ideal for applications that require consistent and reliable Markdown processing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    RDP-Parser

    RDP-Parser

    RDP-Parser extracts RDP activities from Microsoft Windows Event Logs.

    This tool has been designed for any investigation involving exploitation of RDP service. It supports Evt and Evtx formats.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    Opal

    Opal

    Opal is a Ruby to JavaScript source-to-source compiler

    Opal is a Ruby to JavaScript source-to-source compiler. It comes packed with the Ruby corelib you know and love. It is both fast as a runtime and small in its footprint. The lib directory holds the Opal parser/compiler used to compile Ruby into JavaScript. It is also built ready for the browser into opal-parser.js to allow compilation in any JavaScript environment. This directory holds the Opal runtime and corelib implemented in Ruby and JavaScript. opal-parser allows you to eval Ruby code...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    Sanitize

    Sanitize

    Ruby HTML and CSS sanitizer

    ... that you don't explicitly allow will be removed. Sanitize is based on the Nokogiri HTML5 parser, which parses HTML the same way modern browsers do, and Crass, which parses CSS the same way modern browsers do. As long as your allowlist config only allows safe markup and CSS, even the most malformed or malicious input will be transformed into safe output.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    parse5

    parse5

    HTML parsing/serialization toolset for Node.js.

    HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant. parse5 provides nearly everything you may need when dealing with HTML. It's the fastest spec-compliant HTML parser for Node to date. It parses HTML the way the latest version of your browser does. It has proven itself reliable in such projects as jsdom, Angular, Lit, Cheerio, rehype and many more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    goquery

    goquery

    A little like that j-thing, only in Go

    goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/HTML package and the CSS Selector library Cascadia. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), and detach()) have been left off. Also, because the net/HTML parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    JSDoc

    JSDoc

    An API documentation generator for JavaScript

    ... comments should generally be placed immediately before the code being documented. Each comment must start with a /** sequence in order to be recognized by the JSDoc parser. Adding a description is simple, just type the description you want in the documentation comment. Once your code is commented, you can use the JSDoc 3 tool to generate an HTML website from your source files.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for URLs...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Epublifier

    Epublifier

    Converts some webnovels to epub format

    A tool to convert website-based books or lists of pages to ePub format to read on your eReader/Kindle/etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Cheerio

    Cheerio

    Implementation of core jQuery designed for the server

    Fast, flexible & lean implementation of core jQuery designed specifically for the server. Cheerio implements a subset of core jQuery. Cheerio removes all the DOM inconsistencies and browser cruft from the jQuery library, revealing its truly gorgeous API. Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient. Cheerio wraps around parse5 parser and can optionally use @FB55's forgiving htmlparser2. Cheerio can parse nearly...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Nokogiri

    Nokogiri

    Tool to work with XML and HTML from Ruby

    Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (C) and xerces (Java). Be secure-by-default by treating all documents as untrusted by default. Be a thin-as-reasonable layer on top of the underlying parsers, and don't attempt to fix behavioral differences between the parsers. "Native gems...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Markdig

    Markdig

    A fast, powerful, CommonMark compliant, extensible Markdown processor

    A fast, powerful, CommonMark compliant, extensible Markdown processor for .NET. Very fast parser and HTML renderer (no-regexp), very lightweight in terms of GC pressure. Abstract Syntax Tree with precise source code location for syntax tree, useful when building a Markdown editor. Check out MarkdownEditor for Visual Studio powered by Markdig! Even the core Markdown/CommonMark parsing is pluggable, so it allows to disable built-in Markdown/Commonmark parsing (e.g Disable HTML parsing) or change...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    mdBook

    mdBook

    Create books from markdown files

    ... documentation and a fine example of what mdBook produces. mdBook includes built in support for both preprocessing your Markdown and alternative renderers for producing formats other than HTML. These facilities also enable other functionality such as validation. Searching Rust's crates.io is a great way to discover more extensions.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    LaRecipe

    LaRecipe

    Write gorgeous documentation for your products using Markdown

    ... in order to match your needs. LaRecipe automatically leverages Markdown to HTML parser out of the box including typography, images, links and others. LaRecipe provides a bunch of amazing looking UI Vue based components due to the fact it compiles the markdown documentation in the back-end to HTML. If you have a very large documentation it's very handy to have search function available so that your users can find their needs quickly.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Floki

    Floki

    Floki is a simple HTML parser that enables search for nodes using CSS

    Floki is a simple HTML parser that enables search for nodes using CSS selectors. Floki needs the :leex module in order to compile. Normally this module is installed with Erlang in a complete installation. By default, Floki uses a patched version of mochiweb_html for parsing fragments due to its ease of installation (it's written in Erlang and has no outside dependencies). fast_html is generally faster, according to the benchmarks conducted by its developers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.