Showing 27 open source projects for "pdf parser"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • 1
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    OpenDataLoader PDF

    OpenDataLoader PDF

    PDF Parser for AI-ready data. Automate PDF accessibility

    OpenDataLoader PDF is an open-source document processing system designed to convert complex PDF files into structured, AI-ready formats such as Markdown, JSON, and HTML while preserving layout, hierarchy, and semantic meaning. It focuses on enabling downstream use cases like retrieval-augmented generation (RAG), knowledge extraction, and document intelligence pipelines by maintaining accurate reading order and spatial metadata through bounding boxes. The tool combines deterministic parsing...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 3
    Jupyter Notebook Tools for Sphinx

    Jupyter Notebook Tools for Sphinx

    Sphinx source parser for Jupyter notebooks

    nbsphinx is a Sphinx extension that provides a source parser for *.ipynb files. Custom Sphinx directives are used to show Jupyter Notebook code cells (and of course their results) in both HTML and LaTeX output. Un-evaluated notebooks – i.e. notebooks without stored output cells – will be automatically executed during the Sphinx build process.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Delphi : VRCalc++ OOSL (Script) and more

    Delphi : VRCalc++ OOSL (Script) and more

    Delphi : VRCalc++ OOSL & + (Paged List, TextEditor, VRAstroVision ...)

    Vincent Radio {Adrix.NT} Sources Library & Applications : Delphi C++ Java VRCalc++ C# VRCalc++ Object Oriented Scripting Language - Engine Source Pascal Code - Delphi Packages Build Prjs - VRCalc++ Scripted System Std RT Library - Guides & Docs (CHM, PDF, DOCX) - VCL & FMX (FireMonkey) Support - Script Test Code (Lang RTL VCL FMX) - Visual Stage Project : VCL & FMX Paged Lists & Iterators : Delphi C++ Java C# Multi-Dim Arrays & Direct Graph Classes : Delphi C++ Java VRCalc++...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    Pdf-extractor is a wrapper around pdf.js to generate images, svgs, html files, text files and json files from a pdf on node.js. A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    pdf-editor

    pdf-editor

    Edit your PDFs without needing a subscription or creating accounts

    Edit your PDFs without needing a subscription or creating accounts. Add a GUI/Turn it into a web application. Add a parser for the command line to do multiple commands at once e.g. merge (cut pdf1) pdf2. Tested working with Python 3.8.5. Install venv (py -3.8 -m pip install virtualenv). PDF and Word documents are binary files, which makes them much more complex than plaintext files. In addition to text, they store lots of font, color, and layout information.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    Publish.jl

    Publish.jl

    A universal document authoring package for Julia

    A universal document authoring package for Julia. This is a package for Julia that provides a general framework for writing prose, technical documentation is its focus, though it is general enough to be applied to any kind of written document.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Swagger2Markup

    Swagger2Markup

    Swagger to AsciiDoc or Markdown converter

    The primary goal of this project is to simplify the generation of up-to-date RESTful API documentation by combining documentation that’s been hand-written with auto-generated API documentation produced by Swagger. The result is intended to be an up-to-date, easy-to-read, on- and offline user guide, comparable to GitHub’s API documentation. The output of Swagger2Markup can be used as an alternative to swagger-UI and can be served as static content. Swagger2Markup converts a Swagger JSON or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Estimate

    Estimate

    Web based Cost Estimation, Material Takeoff and Reconciliation Tool

    "Estimate" is an Open Source web based Construction Cost Estimating Software designed for medium and large Civil Construction and EPC (Engineering Procurement and Construction) companies. Features include Management of Schedule of Rates, Analysis of Rates, Project Estimation (Definitive and Control), Tender Evaluation, Cost Sheet preparation, BOQ Generation, Audit and Projection. Estimate is suitable for a wide variety of trades and businesses, including but not limited to:...
    Leader badge
    Downloads: 37 This Week
    Last Update:
    See Project
  • 11
    XL-Parser

    XL-Parser

    XL-Parser is a tool for data extraction and analysis.

    XL-Parser provides a bunch of functions for data extraction and analysis. It also provides web log analysis features like a tool for detection of suspicious activities. More details and screenshots on http://le-tools.com.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    PyResParser

    PyResParser

    A simple resume parser used for extracting information from resumes

    PyResParser is a simple resume parser that extracts information from resumes, aiding in the automation of resume-processing tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    pdfsummary

    Summarize PDF file contents by page.

    Uses a modified form of Didier Stevens PDF parser to get object descriptions by page and then summarizes them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    CaLi2CoPi is a multiplatform PDF parser library programmed in PostScript. Works with several specialized switch in order to verify, add, extract or change any PDF content. Also supports online execution on web based user interface via Ghostscript.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    HoneyDrive

    HoneyDrive

    Honeypots in a box! HoneyDrive is the premier honeypot bundle distro.

    HoneyDrive is the premier honeypot Linux distro. It is a virtual appliance (OVA) with Xubuntu Desktop 12.04.4 LTS edition installed. It contains over 10 pre-installed and pre-configured honeypot software packages such as Kippo SSH honeypot, Dionaea and Amun malware honeypots, Honeyd low-interaction honeypot, Glastopf web honeypot and Wordpot, Conpot SCADA/ICS honeypot, Thug and PhoneyC honeyclients and more. Additionally it includes many useful pre-configured scripts and utilities to...
    Downloads: 53 This Week
    Last Update:
    See Project
  • 16
    phpShare&Search

    phpShare&Search

    Group file share with advanced text parsing capability for easy search

    Originally created as a church resource sharing system, phpShare&Search allows users to create accounts, share documents, search documents, and like or report documents. phpShare&Search's power comes from its advanced document parser which extracts text from .PDF, .TXT, .DOC, and .DOCX files and its community features of liking resources and reporting them as inappropriate or SPAM. Users also subscribe to weekly updates of new content. User's may choose to download and host/install/configure/modify/manage this code themselves, or contract the code writer to do these functions for them. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    QueLang

    QueLang is a designing tool to use for Questionnaire Design.

    ...Includes: -full documentation -GUI Interface -CLI Interface -Survey Manager TODO: - Write a decent parser (I have to study for that) - Add some more macros - Answer images (instead of text only) -You can always tell me what you want me implement, and I will include it (if possible) in the next update!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    Andoffline

    A toolkit for some Android sms/call Apps, base64 encoder, vcf parser a

    MOVED TO: https://github.com/fulvio999/Andoffline Feature: Browser for exported SMS, CALL and CONTACT from Android Phone Save to PDF file for exported SMS, CALL and CONTACT, VCF parser Support tool for: http://android.riteshsahu.com/apps/sms-backup-restore http://android.riteshsahu.com/apps/call-logs-backup-restore Image base64 encoder/decoder ** Allow to execute job/script execution from SMS sent from remote phone (without internet connection): - connect the phone to PC with usb - Install the Android App - create a job on desktop application - start PC listener and Phone receiver ** See pdf guide for more details - web application to manage job/script stored in a relational DB (Mysql or sqlite). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Spider PCB

    Spider PCB

    Hierarchical Schematic and PCB

    This project is in a pre-alpha stage and is intended to give a rough idea about the final program. It does not do much more than draw pretty pictures. Hierarchical circuit layout is commonplace amongst IC designers, but Spider PCB brings hierarchical layout to the PCB industry. Not only is the schematic hierarchical, but also the layout. Ever wanted to lay out a 16-band equaliser, with 5 sound channels? Lots of copying and pasting on the PCB-side. Just imagine if you could lay out one...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    cextools

    Command line helpers for Conexp files.

    Some small command line programs and a file parser for Concept Explorer (conexp) written in C++. Currently features include: Converters from concept explorer into PDF, PostScript, SVG and PovRay, a modified 3D Freese layout.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    ScientificPdfParser

    Parses scientific articles from PDF and marks the meta data.

    ...The project contains three runnable classes that can work on given PDFs in batch mode via threading: a) BatchHeuristic: A parser that uses defined heuristics and rules. Especially applicable for articles with a broad set of layouts (e.g. PeDocs, http://www.pedocs.de/). b) BatchHybrid: A parser that uses machine learning (Naive Bayes) to find the correct element. Useful for e.g. ACL. c) ModelGenerator: Generates a training model, used by BatchHybrid, from given PDF and XML file
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    A full LR(1) parser generator system with many advanced features.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    ** Guys I have built a much more powerful Fully Featured CMS system at: https://github.com/MacdonaldRobinson/FlexDotnetCMS Macs CMS is a Flat File ( XML and SQLite ) based AJAX Content Management System. It focuses mainly on the Edit In Place editing concept. It comes with a built in blog with moderation support, user manager section, roles manager section, SEO / SEF URL
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    QuickDoc is a java document parser that reads documents from plain text files using a simple language and exports the document to other formats like PDF, HTML, Java Help and XML.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Automatic generation of documentation on Delphi projects from source code. Distinctive features are exact parsing gathering lots of information and a division of the parser and configurable generators (HTML, Win- & HTML-Help, PDF, LaTeX, XMI export)
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB