Showing 539 open source projects for "extract data"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB 8.0 on Atlas | Run anywhere Icon
    MongoDB 8.0 on Atlas | Run anywhere

    Now available in even more cloud regions across AWS, Azure, and Google Cloud.

    MongoDB 8.0 brings enhanced performance and flexibility to Atlas—with expanded availability across 125+ regions globally. Build modern apps anywhere your users are, with the power of a modern database behind you.
    Learn More
  • 1
    Extract TOTP/HOTP secrets

    Extract TOTP/HOTP secrets

    Extract one time password (OTP) secrets from QR codes

    The Python script extract_otp_secrets.py extracts one-time password (OTP) secrets from QR codes exported by two-factor authentication (2FA) apps such as "Google Authenticator".
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Rust Data Analysis

    Rust Data Analysis

    Rust for data analysis encyclopedia (WIP)

    Welcome to the Rust Data Analysis repository! This collection of Jupyter notebooks provides a comprehensive exploration of data analysis using Rust. Powered by a Rust kernel, these notebooks allow you to dive deep into the realm of data analysis, leveraging the capabilities of the Rust programming language. With the help of various Rust libraries, such as ndarray, plotters, and more, you'll be able to extract valuable insights from different datasets with ease.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    LosslessCut

    LosslessCut

    The swiss army knife of lossless video/audio editing

    LosslessCut aims to be the ultimate cross platform FFmpeg GUI for extremely fast and lossless operations on video, audio, subtitle and other related media files. The main feature is lossless trimming and cutting of video and audio files, which is great for saving space by rough-cutting your large video files taken from a video camera, GoPro, drone, etc. It lets you quickly extract the good parts from your videos and discard many gigabytes of data without doing a slow re-encode and thereby...
    Downloads: 322 This Week
    Last Update:
    See Project
  • 4
    mtail

    mtail

    Extract internal monitoring data from application logs

    Extract internal monitoring data from application logs for collection in a time-series database. mtail is a tool for extracting metrics from application logs to be exported into a timeseries database or timeseries calculator for alerting and dashboarding. It fills a monitoring niche by being the glue between applications that do not export their own internal state (other than via logs) and existing monitoring systems, such that system operators do not need to patch those applications...
    Downloads: 36 This Week
    Last Update:
    See Project
  • Picsart Enterprise Background Removal API for Stunning eCommerce Visuals Icon
    Picsart Enterprise Background Removal API for Stunning eCommerce Visuals

    Instantly remove the background from your images in just one click.

    With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.
    Learn More
  • 5
    ImHex

    ImHex

    A Hex Editor for Reverse Engineers, Programmers

    ImHex is a Hex Editor, a tool to display, decode and analyze binary data to reverse engineer their format, extract informations or patch values in them. What makes ImHex special is that it has many advanced features that can often only be found in paid applications. Such features are a completely custom binary template and pattern language to decode and highlight structures in the data, a graphical node-based data processor to pre-process values before they're displayed, a disassembler, diffing...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 6
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 7
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 8
    Addax

    Addax

    Addax is a versatile open-source ETL tool

    Addax is a data integration and ETL (Extract, Transform, Load) tool designed for high-performance data migration tasks. It simplifies the process of moving data between different systems and formats.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 9
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 9 This Week
    Last Update:
    See Project
  • Your top-rated shield against malware and online scams | Avast Free Antivirus Icon
    Your top-rated shield against malware and online scams | Avast Free Antivirus

    Browse and email in peace, supported by clever AI

    Our antivirus software scans for security and performance issues and helps you to fix them instantly. It also protects you in real time by analyzing unknown files before they reach your desktop PC or laptop — all for free.
    Free Download
  • 10
    go-i18n

    go-i18n

    Translate your Go program into multiple languages

    go-i18n is a Go package and a command that helps you translate Go programs into multiple languages. Supports pluralized strings for all 200+ languages in the Unicode Common Locale Data Repository (CLDR). Code and tests are automatically generated from CLDR data. Supports strings with named variables using text/template syntax. Supports message files of any format (e.g. JSON, TOML, YAML). Use goi18n extract to extract all i18n.Message struct literals in Go source files to a message file...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 11
    GraphRAG

    GraphRAG

    A modular graph-based Retrieval-Augmented Generation (RAG) system

    The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 12
    Point Cloud Library

    Point Cloud Library

    A standalone, large scale, open project for 2D/3D image processing

    The Point Cloud Library (PCL) is a standalone, large scale, open project for 2D/3D image and point cloud processing. PCL is released under the terms of the BSD license, and thus free for commercial and research use. Whether you’ve just discovered PCL or you’re a long time veteran, this page contains links to a set of resources that will help consolidate your knowledge on PCL and 3D processing. An additional Wiki resource for developers is available too. To simplify both usage and...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 13
    Dendrite

    Dendrite

    Tools to build web AI agents that can authenticate

    Dendrite Python SDK is a toolkit for building web AI agents that can authenticate, interact with, and extract data from any website, facilitating web automation tasks.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 14
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    esquisse

    esquisse

    RStudio add-in to make plots interactively with ggplot2

    The purpose of this add-in is to let you explore your data quickly to extract the information they hold. You can create visualization with {ggplot2}, filter data with {dplyr} and retrieve generated code. This addin allows you to interactively explore your data by visualizing it with the ggplot2 package. It allows you to draw bar plots, curves, scatter plots, histograms, boxplot and sf objects, then export the graph or retrieve the code to reproduce the graph. This addin allows you...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 16
    Documind

    Documind

    Open-source platform for extracting structured data from documents

    Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    pmd

    pmd

    An extensible multilanguage static code analyzer

    PMD is a source code analyzer. It finds common programming flaws like unused variables, empty catch blocks, unnecessary object creation, and so forth. It supports Java, JavaScript, Salesforce.com Apex and Visualforce, PLSQL, Apache Velocity, XML, and XSL. Additionally, it includes CPD, the copy-paste-detector. CPD finds duplicated code in Java, C, C++, C#, Groovy, PHP, Ruby, Fortran, JavaScript, PLSQL, Apache Velocity, Scala, Objective C, Matlab, Python, Go, Swift and Salesforce.com Apex,...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 18
    PDFIO.jl

    PDFIO.jl

    PDF Reader Library for Native Julia.

    PDFIO is a native Julia implementation for reading PDF files. It's a 100% Julia implementation of the PDF specification. Other than a few well-established algorithms like flate decode (zlib library) or cryptographic operations (OpenSSL library) almost all of the APIs are written in native Julia. PDF files are in existence for over three decades. Implementations of the PDF writers are not always to the specification or they may even vary significantly from vendor to vendor. Every time, you...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 20
    Apache DevLake

    Apache DevLake

    Apache DevLake is an open-source dev data platform

    Apache DevLake is an open-source dev data platform that ingests, analyzes, and visualizes the fragmented data from DevOps tools to extract insights for engineering excellence, developer experience, and community growth. Apache DevLake is designed for developer teams looking to make better sense of their development process and to bring a more data-driven approach to their own practices. You can ask Apache DevLake many questions regarding your development process. Just connect and query. Your...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 21
    Instructor

    Instructor

    Structured outputs for llms

    Instructor is a tool that enables developers to extract structured data from natural language using Large Language Models (LLMs). Integrating with Python's Pydantic library allows users to define desired output structures through type hints, facilitating schema validation and seamless integration with IDEs. Instructor supports various LLM providers, including OpenAI, Anthropic, Litellm, and Cohere, offering flexibility in implementation. Its customizable nature permits the definition...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 22
    Skyvern

    Skyvern

    Automate browser-based workflows with LLMs and Computer Vision

    ..., with support for country, state, or even precise zip-code level targeting. Skyvern understands how to solve CAPTCHAs to complete complicated workflows. Support for authenticating into user accounts, including support for 2FA/TOTP. Extract data from workflows in any schema of your choice including CSV or JSON. Automate procurement pipelines, breeze through government forms, and complete workflows in any language.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 23
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 24
    OpenAPI.NET

    OpenAPI.NET

    Object model for OpenAPI documents in .NET

    The OpenAPI.NET SDK contains a useful object model for OpenAPI documents in .NET along with common serializers to extract raw OpenAPI JSON and YAML documents from the model. The OpenAPI.NET project holds the base object model for representing OpenAPI documents as .NET objects. Some developers have found the need to write processors that convert other data formats into this OpenAPI.NET object model. We'd like to curate that list of processors in this section of the readme. Converts standard .NET...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 25
    Jbuilder

    Jbuilder

    Generate JSON objects with a Builder-style DSL

    Jbuilder gives you a simple DSL for declaring JSON structures that beats manipulating giant hash structures. This is particularly helpful when the generation process is fraught with conditionals and loops. You can either use Jbuilder stand-alone or directly as an ActionView template language. When required in Rails, you can create views à la show.json.jbuilder (the json is already yielded). Fragment caching is supported, it uses Rails.cache and works like caching in HTML templates. If your...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.