Showing 69 open source projects for "dataflow"

View related business solutions
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • Catch Bugs Before Your Customers Do Icon
    Catch Bugs Before Your Customers Do

    Real-time error alerts, performance insights, and anomaly detection across your full stack. Free 30-day trial.

    Move from alert to fix before users notice. AppSignal monitors errors, performance bottlenecks, host health, and uptime—all from one dashboard. Instant notifications on deployments, anomaly triggers for memory spikes or error surges, and seamless log management. Works out of the box with Rails, Django, Express, Phoenix, Next.js, and dozens more. Starts at $23/month with no hidden fees.
    Try AppSignal Free
  • 1
    Scio

    Scio

    A Scala API for Apache Beam and Google Cloud Dataflow

    Scio is a Scala API developed by Spotify that builds on Apache Beam to enable expressive batch and streaming data pipelines, optimized for running on Google Cloud Dataflow. Inspired by Spark and Scalding, it provides scalable, type‑safe, and production-grade data processing, with built-in support for BigQuery, Pub/Sub, Cassandra, Elasticsearch, Redis, TensorFlow IO, and more.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Arroyo

    Arroyo

    Distributed stream processing engine in Rust

    Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    XGBoost

    XGBoost

    Scalable and Flexible Gradient Boosting

    XGBoost is an optimized distributed gradient boosting library, designed to be scalable, flexible, portable and highly efficient. It supports regression, classification, ranking and user defined objectives, and runs on all major operating systems and cloud platforms. XGBoost works by implementing machine learning algorithms under the Gradient Boosting framework. It also offers parallel tree boosting (GBDT, GBRT or GBM) that can quickly and accurately solve many data science problems....
    Downloads: 9 This Week
    Last Update:
    See Project
  • 4
    TensorFlow

    TensorFlow

    TensorFlow is an open source library for machine learning

    ...The platform can be easily deployed on multiple CPUs, GPUs and Google's proprietary chip, the tensor processing unit (TPU). TensorFlow expresses its computations as dataflow graphs, with each node in the graph representing an operation. Nodes take tensors—multidimensional arrays—as input and produce tensors as output. The framework allows for these algorithms to be run in C++ for better performance, while the multiple levels of APIs let the user determine how high or low they wish the level of abstraction to be in the models produced. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Apache Beam

    Apache Beam

    Unified programming model for Batch and Streaming

    ...These pipelines are executed on one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Beam is especially useful for Embarrassingly Parallel data processing tasks, and caters to the different needs and backgrounds of end users, SDK writers and runner writers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    ComfyUI-LTXVideo

    ComfyUI-LTXVideo

    LTX-Video Support for ComfyUI

    ...Instead of writing code to apply effects, transitions, edits, and data flows, users can assemble nodes that represent video inputs, transformations, and outputs, letting them prototype and automate video production pipelines visually. This integration empowers non-programmers and rapid-iteration teams to harness the performance of LTX-Video while maintaining the clarity and flexibility of a dataflow graph model. It supports nodes for common video operations like trimming, layering, color grading, and generative augmentations, making it suitable for everything from simple clip edits to complex sequences with conditional behavior.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    WALA

    WALA

    Libraries for Analysis, with frontends for Java, Android, and JS

    The T. J. Watson Libraries for Analysis (WALA) provide static analysis capabilities for Java bytecode and related languages and for JavaScript. The system is licensed under the Eclipse Public License, which has been approved by the OSI (Open Source Initiative) as a fully certified open-source license. The initial WALA infrastructure was independently developed as part of the DOMO research project at the IBM T.J. Watson Research Center. In 2006, IBM donated the software to the community. The...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Hamilton DAGWorks

    Hamilton DAGWorks

    Helps scientists define testable, modular, self-documenting dataflow

    Hamilton is a lightweight Python library for directed acyclic graphs (DAGs) of data transformations. Your DAG is portable; it runs anywhere Python runs, whether it's a script, notebook, Airflow pipeline, FastAPI server, etc. Your DAG is expressive; Hamilton has extensive features to define and modify the execution of a DAG (e.g., data validation, experiment tracking, remote execution). To create a DAG, write regular Python functions that specify their dependencies with their parameters. As...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Pathway

    Pathway

    Python ETL framework for stream processing, real-time analytics, LLM

    ...Unlike traditional batch processing frameworks, Pathway continuously updates the results of your data logic as new events arrive, functioning more like a database that reacts in real-time. It supports Python, integrates with modern data tools, and offers a deterministic dataflow model to ensure reproducibility and correctness.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    Bytewax

    Bytewax

    Python Stream Processing

    ...Connect data sources, run stateful transformations, and write to various downstream systems with built-in connectors or existing Python libraries. Bytewax is a Python framework and Rust distributed processing engine that uses a dataflow computational model to provide parallelizable stream processing and event processing capabilities similar to Flink, Spark, and Kafka Streams. You can use Bytewax for a variety of workloads from moving data à la Kafka Connect style all the way to advanced online machine learning workloads. Bytewax is not limited to streaming applications but excels anywhere that data can be distributed at the input and output.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    ElasticJob

    ElasticJob

    Distributed scheduled job framework

    ElasticJob is a distributed scheduling solution consisting of two separate projects, ElasticJob-Lite and ElasticJob-Cloud. ElasticJob-Lite is a lightweight, decentralized solution that provides distributed task sharding services. ElasticJob-Cloud uses Mesos to manage and isolate resources. It uses a unified job API for each project. Developers only need code one time and can deploy at will. Support job sharding and high availability in distributed system. Scale out for throughput and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    OctoSQL

    OctoSQL

    Join, analyse and transform data from multiple databases

    ...OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL. At the same time it's an easily extensible full-blown dataflow engine, and you can use it to add a SQL interface to your own applications. OctoSQL supports a bunch of file formats out of the box, but you can additionally install plugins to add support for other databases. You can specify the output format using the --output flag. Available values for it are live_table, batch_table, csv and stream_native. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Apache Flink

    Apache Flink

    Stream processing framework with powerful stream

    Apache Flink is a distributed engine for stateful computations over data streams and batches, designed for low-latency processing at scale. Its core runtime executes dataflow graphs with fine-grained backpressure and checkpointing, allowing applications to recover consistently from failures. Flink’s event-time model and watermarks enable accurate out-of-order processing, windowing, and complex time semantics that typical real-time systems struggle with. Developers program against high-level APIs—DataStream and Table/SQL—to express transformations, joins, and stateful patterns, while specialized libraries support CEP, machine learning workflows, and connectors. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    BMDFM

    BMDFM

    Binary Modular DataFlow Machine (BMDFM)

    ...The BMDFM dynamic scheduling subsystem performs a symmetric multiprocessing (SMP) emulation of a tagged-token dataflow machine to provide the transparent dataflow semantics for the applications. No directives for parallel execution are needed. More info: http://www.bmdfm.com
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    Shapes

    Graphical programming. Includes n-dimensional sorting.

    Write programs as graphical dataflow charts instead of text. Compile them to any programming language you want. Besides this project includes the most efficient tree-based sorting algorithm that is possible. Originally developed on a CTOS Color NGEN, at first in Pascal, later ported to C, finally - 20 years later - ported to Linux. Currently it's still not really system independent.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    SwiftVis

    SwiftVis

    Data Exploration for Swift and other N-body simulation

    This is a project that was started back in 2001 to help data analysis for a revision of Hal Levison's Swift planetary simulation framework. SwiftVis is a data exploration and visualization package written in Java that has a GUI which supports the creation of visual "dataflow programs" and can be used by non-programmers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Tributary

    Tributary

    Streaming reactive and dataflow graphs in Python

    Tributary is a library for constructing dataflow graphs in Python. Unlike many other DAG libraries in Python (airflow, luigi, prefect, dagster, dask, kedro, etc), tributary is not designed with data/etl pipelines or scheduling in mind. Instead, tributary is more similar to libraries like mdf, loman, pyungo, streamz, or pyfunctional, in that it is designed to be used as the implementation for a data model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    DRT

    DRT

    Dataflow Run Time

    This software aims at demonstrating that we can easily provide a very small and powerful runtime for running programs that are coded in whatever programming model, but that could be *executed* in a DATAFLOW style. The Dataflow Run Time (DRT) provides the runtime support for that The first benefit of this software is to allow a rapid development of such programs in the context of the TERAFLUX project http://teraflux.eu The runtime API has been designed in such way to allow for a future development of a good compiler that targets such interface on one side, and to allow for a good architectural support of such API too: ideally each function could map to a Thread-Level-Parallelism Instruction Set Extension (TLP ISE).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    DSPatch

    DSPatch

    The Refreshingly Simple C++ Dataflow Framework

    Webite: http://flowbasedprogramming.com DSPatch, pronounced "dispatch", is a powerful C++ dataflow framework. DSPatch is not limited to any particular domain or data type, from reactive programming to stream processing, DSPatch's generic, object-oriented API allows you to create virtually any dataflow system imaginable. *See also:* DSPatcher ( https://github.com/MarcusTomlinson/DSPatcher ): A cross-platform graphical tool for building DSPatch circuits.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    DataKit

    DataKit

    Connect processes into powerful data pipelines

    Connect processes into powerful data pipelines with a simple git-like filesystem interface. DataKit is a tool to orchestrate applications using a Git-like dataflow. It revisits the UNIX pipeline concept, with a modern twist: streams of tree-structured data instead of raw text. DataKit allows you to define complex build pipelines over version-controlled data. DataKit is currently used as the coordination layer for HyperKit, the hypervisor component of Docker for Mac and Windows, and for the DataKitCI continuous integration system. src contains the main DataKit service. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Sonic Flow are C/C++ libraries for dataflow-oriented audio signal processing. Advanced features: feedback, multirate and hierarchical networks; signal processing block library. High-quality chorus/flanger, compressor/expander, param.EQ etc. example code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Python Taint

    Python Taint

    Static Analysis Tool for Detecting Security Vulnerabilities in Python

    Static analysis of Python web applications based on theoretical foundations (Control flow graphs, fixed point, dataflow analysis) Detect command injection, SSRF, SQL injection, XSS, directory traveral etc. A lot of customization is possible. For functions from builtins or libraries, e.g. url_for or os.path.join, use the -m option to specify whether or not they return tainted values given tainted inputs, by default this file is used.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    WiredSoft

    C++ library to make software based on dataflow diagrams

    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Easy Machine Learning

    Easy Machine Learning

    Easy Machine Learning is a general-purpose dataflow-based system

    ...The key barriers come from not only the implementation of the algorithms themselves but also the processing for applying them to real applications which often involve multiple steps and different algorithms. Our platform Easy Machine Learning presents a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real-world tasks. In the system, a learning task is formulated as a directed acyclic graph (DAG) in which each node represents an operation (e.g. a machine learning algorithm), and each edge represents the flow of the data from one node to its descendants.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Dataflow Java SDK

    Dataflow Java SDK

    Google Cloud Dataflow provides a simple, powerful model

    The Dataflow Java SDK is the open-source Java library that powers Apache Beam pipelines for Google Cloud Dataflow, a serverless and scalable platform for processing large datasets in both batch and stream modes. This SDK allows developers to write Beam-based pipelines in Java and execute them on Dataflow, taking advantage of features like autoscaling, dynamic work rebalancing, and fault-tolerant distributed processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB