Showing 1753 open source projects for "extract"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    LangChain Extract

    LangChain Extract

    Did you say you like data?

    LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Extract TOTP/HOTP secrets

    Extract TOTP/HOTP secrets

    Extract one time password (OTP) secrets from QR codes

    The Python script extract_otp_secrets.py extracts one-time password (OTP) secrets from QR codes exported by two-factor authentication (2FA) apps such as "Google Authenticator".
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4

    lessmsi

    Tool to view and extract contents of a Windows Installer (.msi) file

    lessmsi (formerly known as Less Msiérables) is a free utility with a graphical user interface and a command line interface used for viewing and extracting the contents of a Windows Installer (.msi) file.
    Downloads: 41 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    EMV NFC Paycard Enrollment

    EMV NFC Paycard Enrollment

    A Java library used to read and extract data from NFC EMV credit cards

    Java library used to read and extract public data from NFC EMV credit cards.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 6

    ldif-extract

    Extrect selected entries from LDIF files like grep

    ldif-extract is a small 'grep' like tool to extract and convert data from LDIF files. It could be used standalone or also in a pipe together with other tools like ldapsearch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    PDFsam

    PDFsam

    PDFsam, a desktop application to split, merge, mix, rotate PDF files

    PDFsam Basic is our free and open-source desktop application to split, merge, extract pages, rotate and mix PDF files. PDFsam Visual is a powerful tool to visually compose PDF files, reorder pages, delete pages, split, merge, rotate, encrypt, decrypt, extract text, convert to grayscale, crop PDF files. PDFsam Basic is written using JavaFX. Since version 4 it is released as a self-contained application and bundles a jlinked JDK while version 3 requires a Java Runtime Environment 8 with JavaFx installed in order to run.
    Downloads: 54 This Week
    Last Update:
    See Project
  • 8
    Volatility

    Volatility

    An advanced memory forensics framework

    Volatility is a widely used open-source framework for analyzing memory captures (RAM dumps) from Windows, Linux, and macOS systems. It enables investigators and malware analysts to extract process lists, network connections, DLLs, strings, artifacts, and more. Volatility supports many plugins for detecting hidden processes, malware, rootkits, and event tracing. It’s essential in digital forensics and incident response workflows.
    Downloads: 114 This Week
    Last Update:
    See Project
  • 9
    PdfPig

    PdfPig

    Read and extract text and other content from PDFs in C#

    This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.
    Downloads: 6 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    Autopsy

    Autopsy

    Autopsy® is a digital forensics platform and graphical interface

    Autopsy® is a digital forensics platform and graphical interface to The Sleuth Kit® and other digital forensics tools. It can be used by law enforcement, military, and corporate examiners to investigate what happened on a computer. You can even use it to recover photos from your camera's memory card. Autopsy was designed to be intuitive out of the box. Installation is easy and wizards guide you through every step. All results are found in a single tree. See the intuitive page for more...
    Downloads: 75 This Week
    Last Update:
    See Project
  • 11
    Allure Report

    Allure Report

    Flexible, lightweight multi-language test reporting tool

    Allure Report is a flexible, lightweight multi-language test reporting tool. It provides clear graphical reports and allows everyone involved in the development process to extract the maximum of information from the everyday testing process. Allure Report is a flexible multi-language test report tool to show you a detailed representation of what has been tested end extract max from the everyday execution of tests. Allure Report is capable to build unified reports for dozens of testing tools across eleven programming languages on several CI/CD systems.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 12
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    Documind

    Documind

    Open-source platform for extracting structured data from documents

    Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    AIO-Switch-Updater

    AIO-Switch-Updater

    Update your CFW, cheat codes, firmwares from your Nintendo Switch

    ...AIO-Switch-Updater uses a custom RCM payload to finalise the install as it can't be performed while HOS is running. Download and update Hekate, as well as a selection of RCM payloads. Download and extract daily-updated cheat code. The program will only extract cheat codes for the games you own. By default, this homebrew will overwrite the existing cheats. If you have your own cheat files that you'd like to keep as is, you can turn off cheat updates for specific titles.
    Downloads: 35 This Week
    Last Update:
    See Project
  • 15
    Toutatis

    Toutatis

    Extract public Instagram account information from usernames

    Toutatis is an open source command-line tool designed to extract publicly available information from Instagram accounts. It helps users gather various data points from a target profile by querying Instagram using a username or account ID. The tool can retrieve details such as profile metadata, follower counts, biography information, and other publicly accessible account attributes. In addition to basic profile data, Toutatis can also reveal contact details that may be publicly exposed, including email addresses and phone numbers associated with the account. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 16
    PHP Font Lib

    PHP Font Lib

    A library to read, parse, export and make subsets of different fonts

    This library can be used to read TrueType, OpenType (with TrueType glyphs), WOFF font files. Extract basic info (name, style, etc). Extract advanced info (horizontal metrics, glyph names, glyph shapes, etc). Make an Adobe Font Metrics (AFM) file from a font. You can find a demo GUI. This project was initiated by the need to read font files in the DOMPDF project.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    ...ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    LLM Scraper

    LLM Scraper

    Extract structured data from webpages using LLM-powered scraping

    LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    Certificate Ripper

    Certificate Ripper

    A CLI tool to extract server certificates

    A CLI tool to extract server certificates. No openssl required runs on any Operating System. It can be used with or without Java, native executables are present in the releases. Extracts all the sub-fields of the certificate. Certificates can be formatted to PEM format. Bulk extraction of multiple different URLs with a single command is possible. Extracted certificates can be stored automatically in a p12 trust store.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    FFsubsync

    FFsubsync

    Automagically synchronize subtitles with video

    ...In this case, you can use the correctly synchronized srt file directly as a reference for synchronization, instead of using the video as the reference. ffsubsync uses the file extension to decide whether to perform voice activity detection on the audio or to directly extract speech from an srt file. ffsubsync usually finishes in 20 to 30 seconds, depending on the length of the video.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 22
    Chandra

    Chandra

    OCR model for complex documents with layout-aware structured outputs

    Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 23
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    LosslessCut

    LosslessCut

    The swiss army knife of lossless video/audio editing

    ...The main feature is lossless trimming and cutting of video and audio files, which is great for saving space by rough-cutting your large video files taken from a video camera, GoPro, drone, etc. It lets you quickly extract the good parts from your videos and discard many gigabytes of data without doing a slow re-encode and thereby losing quality. Or you can add a music or subtitle track to your video without needing to encode. Everything is extremely fast because it does an almost direct data copy, fueled by the awesome FFmpeg which does all the grunt work. ...
    Downloads: 655 This Week
    Last Update:
    See Project
  • 25
    Beets

    Beets

    Open-source music library management system

    Beets catalogs your music collection with a variety of tools for manipulating and accessing music.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB