Showing 269 open source projects for "data transformation"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 1
    Data Formulator

    Data Formulator

    Create rich visualizations with AI

    To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly improved visualization authoring experiences, for example by mitigating manual data transformation barriers via LLMs' code generation ability. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Data-Juicer

    Data-Juicer

    Data processing for and with foundation models

    Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    The Data Engineering Handbook

    The Data Engineering Handbook

    Links to everything you'd ever want to learn about data engineering

    ...It includes beginner and intermediate boot camps, interview guides, data cleaning and transformation resources, and curated lists of newsletters and industry communities, making it useful both for self-study and technical interview preparation. The repository is actively maintained and widely starred, reflecting its role as a go-to reference for newcomers and experienced practitioners alike.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Polyhedra

    Polyhedra

    Polyhedral Computation Interface

    Polyhedra provides an unified interface for Polyhedral Computation Libraries such as CDDLib.jl. This manipulation notably includes the transformation from (resp. to) an inequality representation of a polyhedron to (resp. from) its generator representation (convex hull of points + conic hull of rays) and projection/elimination of a variable with e.g. Fourier-Motzkin.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    collapse

    collapse

    Advanced and Fast Data Transformation in R

    collapse is a high-performance R package designed for fast and efficient data transformation, aggregation, reshaping, and statistical computation. Built to offer a more performant alternative to dplyr and data.table, it is particularly well-suited for large datasets and econometric applications. It operates on base R data structures like data frames and vectors and uses highly optimized C++ code under the hood to deliver significant speed improvements. collapse also includes tools for grouped operations, weighted statistics, and time series manipulation, making it a compact yet powerful utility for data scientists and researchers working in R.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Chain.jl

    Chain.jl

    A Julia package for piping a value through transformation expressions

    A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Pixeltable

    Pixeltable

    Data Infrastructure providing an approach to multimodal AI workloads

    Pixeltable is an open-source Python data infrastructure framework designed to support the development of multimodal AI applications. The system provides a declarative interface for managing the entire lifecycle of AI data pipelines, including storage, transformation, indexing, retrieval, and orchestration of datasets. Unlike traditional architectures that require multiple tools such as databases, vector stores, and workflow orchestrators, Pixeltable unifies these functions within a table-based abstraction. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    PeerDB

    PeerDB

    Fast, Simple and a cost effective tool to replicate data from Postgres

    PeerDB is an open-source platform for real-time replication and transformation of data from PostgreSQL to analytical warehouses like BigQuery and Snowflake. It supports Change Data Capture (CDC) and provides seamless syncing and transformation logic with low latency. PeerDB is ideal for teams building real-time data pipelines without relying on expensive proprietary solutions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    AI Data Science Team

    AI Data Science Team

    An AI-powered data science team of agents

    AI Data Science Team is a Python library and agent ecosystem designed to accelerate and automate common data science workflows by modeling them as specialized AI “agents” that can be orchestrated to perform tasks like data cleaning, transformation, analysis, visualization, and machine learning. It provides a modular agent framework where each agent focuses on a step in the typical data science pipeline — for example, loading data from CSV/Excel files, cleaning and wrangling messy datasets, engineering predictive features, building models with AutoML, connecting to SQL databases, and producing visual outputs — all driven by natural language or programmatic instructions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    Malli

    Malli

    High-performance data-driven data specification library

    Malli is a powerful, data-driven schema library for Clojure and ClojureScript, offering rich support for specification, validation, parsing, error reporting, and generative testing. Designed for performance, Malli leverages efficient runtime representations and code generation, seamlessly integrating with Clojure’s data-oriented architecture. It supports function schemas, JSON transformation, and OpenAPI generation for strong API contracts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Blue Whale Configuration Platform

    Blue Whale Configuration Platform

    Blue Whale smart cloud configuration platform

    Has accumulated experience in supporting hundreds of Tencent businesses, compatible with various complex system architectures, born in operation and maintenance, and proficient in operation and maintenance. From configuration management to job execution, task scheduling and monitoring self-healing, and then through operation and maintenance big data analysis to assist operational decision-making, it covers the full-cycle assurance management of business operations in a comprehensive manner. The open PaaS has a powerful development framework and scheduling engine, as well as a complete operation and maintenance development training system, which helps the rapid transformation and upgrading of operation and maintenance. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Addax

    Addax

    Addax is a versatile open-source ETL tool

    Addax is a data integration and ETL (Extract, Transform, Load) tool designed for high-performance data migration tasks. It simplifies the process of moving data between different systems and formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    GraphRAG

    GraphRAG

    A modular graph-based Retrieval-Augmented Generation (RAG) system

    The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    SQL Notebook

    SQL Notebook

    SQL Notebook — Casual data exploration in SQL

    SQL Notebook is a free Windows application for querying and analyzing data across multiple sources, including SQLite, PostgreSQL, Excel, and CSV files. It combines a SQL editor with a notebook interface, allowing for data exploration, transformation, and visualization in one place. SQL Notebook is ideal for analysts and data enthusiasts.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Typia

    Typia

    Super-fast/easy runtime validations and serializations

    Super-fast/easy runtime validations and serializations through transformation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Datacap

    Datacap

    DataCap is integrated software for data transformation

    Datacap is an open-source data catalog and governance tool that helps organizations manage and document their data assets. It provides metadata management, lineage tracking, and collaboration features to ensure data transparency and quality. Datacap is designed for teams that need a lightweight, self-hosted solution to organize and govern their data ecosystems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Greenmask

    Greenmask

    PostgreSQL database anonymization and synthetic data generation tool

    Greenmask is a powerful open-source utility that is designed for logical database backup dumping, obfuscation, and restoration. It offers extensive functionality for backup, anonymization, and data masking. Greenmask is written in pure Go and includes ported PostgreSQL libraries that allows for platform independence. This tool is stateless and does not require any changes to your database schema. It is designed to be highly customizable and backward-compatible with existing PostgreSQL...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Kapacitor

    Kapacitor

    Open source framework for processing, monitoring, and alerting

    Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    NeuralOperators.jl

    NeuralOperators.jl

    DeepONets, Neural Operators, Physics-Informed Neural Ops in Julia

    Neural operator is a novel deep learning architecture. It learns an operator, which is a mapping between infinite-dimensional function spaces. It can be used to resolve partial differential equations (PDE). Instead of solving by finite element method, a PDE problem can be resolved by training a neural network to learn an operator mapping from infinite-dimensional space (u, t) to infinite-dimensional space f(u, t). Neural operator learns a continuous function between two continuous function...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Tidier.jl

    Tidier.jl

    Meta-package for data analysis in Julia, modeled after the R tidyverse

    Tidier.jl is a Julia package that brings tidyverse-style data manipulation and analysis to Julia, inspired by R's dplyr and tidyverse. It allows users to write expressive and concise data transformation code using chaining (|>) and intuitive syntax. Built on top of DataFrames.jl, Tidier.jl aims to make data wrangling more accessible to users familiar with R or looking for cleaner data pipelines in Julia.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    sttr

    sttr

    Cross-platform, cli app to perform various operations on string

    sttr is command-line software that allows you to quickly run various transformation operations on the string.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Hacks

    Hacks

    A collection of hacks and one-off scripts

    Hacks is a collection of experimental scripts, utilities, and one-off tools created to solve specific problems in security research, data processing, and automation. Rather than being a single cohesive application, it serves as a repository of practical command-line tools that can be used independently or combined into workflows. The scripts cover a wide range of tasks, including URL manipulation, parameter replacement, data extraction, and reconnaissance automation. Many of the tools in the...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    Lantern Database

    Lantern Database

    PostgreSQL vector database extension for building AI applications

    Lantern is a real-time data transformation engine that enables data engineers to build, run, and monitor streaming data pipelines with SQL. It’s designed to process events in motion, offering low-latency stream transformations, aggregations, and enrichment in a declarative way. Lantern is especially suited for modern data infrastructure and analytics platforms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    DataFramesMeta.jl

    DataFramesMeta.jl

    Metaprogramming tools for DataFrames

    Metaprogramming tools for DataFrames.jl objects to provide more convenient syntax. DataFrames.jl has the functions select, transform, and combine, as well as the in-place select! and transform! for manipulating data frames. DataFramesMeta.jl provides the macros @select, @transform, @combine, @select!, and @transform! to mirror these functions with more convenient syntax. Inspired by dplyr in R and LINQ in C#.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB