Showing 51 open source projects for "c++ parallelism"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 1
    Distributed Llama

    Distributed Llama

    Connect home devices into a powerful cluster to accelerate LLM

    Distributed Llama is an open-source project that enables users to connect multiple home devices into a powerful cluster to accelerate Large Language Model (LLM) inference. By leveraging tensor parallelism and high-speed synchronization over Ethernet, it allows for faster performance as more devices are added to the cluster. The system supports various operating systems, including Linux, macOS, and Windows, and is optimized for both ARM and x86_64 AVX2 CPUs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    fairseq2

    fairseq2

    FAIR Sequence Modeling Toolkit 2

    fairseq2 is a modern, modular sequence modeling framework developed by Meta AI Research as a complete redesign of the original fairseq library. Built from the ground up for scalability, composability, and research flexibility, fairseq2 supports a broad range of language, speech, and multimodal content generation tasks, including instruction fine-tuning, reinforcement learning from human feedback (RLHF), and large-scale multilingual modeling. Unlike the original fairseq—which evolved into a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    SuiteSparse

    SuiteSparse

    The official SuiteSparse library: a suite of sparse matrix algorithms

    The official SuiteSparse library: a suite of sparse matrix algorithms authored or co-authored by Tim Davis, Texas A&M University.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    UCCL

    UCCL

    UCCL is an efficient communication library for GPUs

    UCCL is a high-performance GPU communication library designed to support distributed machine learning workloads and large-scale AI systems. The library focuses on enabling efficient data transfer and collective communication between GPUs during training and inference processes. It supports a variety of communication patterns including collective operations such as all-reduce as well as peer-to-peer transfers that are commonly used in modern machine learning architectures. UCCL is designed to...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Cut Data Warehouse Costs by 54% Icon
    Cut Data Warehouse Costs by 54%

    Easily migrate from Snowflake, Redshift, or Databricks with free tools.

    BigQuery delivers 54% lower TCO with exabyte scale and flexible pricing. Free migration tools handle the SQL translation automatically.
    Try Free
  • 5
    MyDumper

    MyDumper

    MyDumper project

    MyDumper is a MySQL Logical Backup Tool. It has 2 tools. mydumper which is responsible to export a consistent backup of MySQL databases. myloader reads the backup from mydumper, connects the to destination database and imports the backup. Both tools use multithreading capabilities. MyDumper is Open Source and maintained by the community, it is not a Percona, MariaDB or MySQL product. Parallelism (hence, speed) and performance (avoids expensive character set conversion routines, efficient...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 6
    tt-metal

    tt-metal

    TT-NN operator library, and TT-Metalium low level kernel programming

    tt-metal, also referred to in its documentation as TT-Metalium, is Tenstorrent’s low-level software development kit for programming applications on Tenstorrent AI accelerators. The project is designed for developers who need direct access to the company’s Tensix processor architecture, exposing a programming model that is closer to hardware control than high-level inference frameworks. Instead of following a traditional GPU model centered on massive thread parallelism, the platform is built...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    OneFlow

    OneFlow

    OneFlow is a deep learning framework designed to be user-friendly

    OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. An extension for OneFlow to target third-party compiler, such as XLA, TensorRT and OpenVINO etc.CUDA runtime is statically linked into OneFlow. OneFlow will work on a minimum supported driver, and any driver beyond. For more information. Distributed performance (efficiency) is the core technical difficulty of the deep learning framework. OneFlow focuses on performance improvement and heterogeneous...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Puma

    Puma

    A Ruby/Rack web server built for concurrency

    Unlike other Ruby Webservers, Puma was built for speed and parallelism. Puma is a small library that provides a very fast and concurrent HTTP 1.1 server for Ruby web applications. It is designed for running Rack apps only. What makes Puma so fast is the careful use of a Ragel extension to provide fast, accurate HTTP 1.1 protocol parsing. This makes the server scream without too many portability issues. If you are using Bundler, just add Puma to your project's Gemfile. Once you've installed...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    cuDF

    cuDF

    GPU DataFrame Library

    Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming. For additional examples, browse our complete API documentation, or check out our more detailed notebooks. cuDF can be installed...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Catch Bugs Before Your Customers Do Icon
    Catch Bugs Before Your Customers Do

    Real-time error alerts, performance insights, and anomaly detection across your full stack. Free 30-day trial.

    Move from alert to fix before users notice. AppSignal monitors errors, performance bottlenecks, host health, and uptime—all from one dashboard. Instant notifications on deployments, anomaly triggers for memory spikes or error surges, and seamless log management. Works out of the box with Rails, Django, Express, Phoenix, Next.js, and dozens more. Starts at $23/month with no hidden fees.
    Try AppSignal Free
  • 10
    Vector Pascal is a language targeted at SIMD multi-core instruction-sets such as the AVX and SSE2 or x86-64-v3. It has a SIMD compiler which supports parallel vector operations, loop unrolling, common sub expression removal etc. It is implemented in Java.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    KeyKiller-Cuda

    KeyKiller-Cuda

    Solving the Satoshi Puzzle

    KeyKiller is a GPU-accelerated version of the KeyKiller project, designed to achieve extreme performance in solving Satoshi Nakamoto's puzzles using modern NVIDIA GPUs. KeyKiller CUDA pushes the limits of cryptographic key search performance by leveraging CUDA, thread-beam parallelism, and batch EC operations. The command-line version is open-source and free to use. For the paid advanced graphics version, please visit: https://gitlab.com/8891689/KeyKiller-Cuda/
    Downloads: 10 This Week
    Last Update:
    See Project
  • 12
    BMDFM

    BMDFM

    Binary Modular DataFlow Machine (BMDFM)

    Binary Modular DataFlow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program. The BMDFM dynamic scheduling subsystem performs a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    This is a basic, low-level library with pretensions to implementing features above and beyond (but not necessarily better than!) those implemented within the Standard C++ Library and the Boost Library. In particular data-flow based parallelism and a FIX-to Exchange-Protocol message translator that is blindingly fast! The source code has moved to GitLab: https://gitlab.com/jmmcg/libjmmcg
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    CUDA Pathtracer

    CUDA Pathtracer

    GPU Raytracer from scratch in C++/CUDA

    ...It demonstrates the power of modern GPU architectures to handle complex lighting calculations, reflections, shadows, and global illumination in real-time. This project is educational and experimental, providing insight into GPU parallelism and real-time rendering techniques. Its clean and modular C++ structure makes it a great reference for students and graphics developers alike.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Apache MXNet (incubating)

    Apache MXNet (incubating)

    A flexible and efficient library for deep learning

    Apache MXNet is an open source deep learning framework designed for efficient and flexible research prototyping and production. It contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations. On top of this is a graph optimization layer, overall making MXNet highly efficient yet still portable, lightweight and scalable.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    MXNet

    MXNet

    Lightweight, Portable, Flexible Distributed/Mobile Deep Learning

    Apache MXNet is a scalable, efficient open-source deep learning framework—offering a flexible hybrid programming model (symbolic + imperative) and supporting a wide array of languages—designed for training and deploying neural networks across heterogeneous systems. Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DRT

    DRT

    Dataflow Run Time

    This software aims at demonstrating that we can easily provide a very small and powerful runtime for running programs that are coded in whatever programming model, but that could be *executed* in a DATAFLOW style. The Dataflow Run Time (DRT) provides the runtime support for that The first benefit of this software is to allow a rapid development of such programs in the context of the TERAFLUX project http://teraflux.eu The runtime API has been designed in such way to allow for a future...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    GHC (Glasgow Haskell Compiler)

    GHC (Glasgow Haskell Compiler)

    Mirror of the Glasgow Haskell Compiler

    GHC (Glasgow Haskell Compiler) is the leading open-source compiler and interactive environment for the Haskell programming language, supporting the Haskell 2010 standard plus numerous language extensions. It compiles to native machine code (via LLVM or C), and includes the interactive GHCi REPL. For full information on building GHC, see the GHC Building Guide. Here follows a summary - if you get into trouble, the Building Guide has all the answers. For building library documentation, you'll...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    MultiPathNet

    MultiPathNet

    A Torch implementation of the object detection network

    MultiPathNet is a Torch-7 implementation of the “A MultiPath Network for Object Detection” paper (BMVC 2016), developed by Facebook AI Research. It extends the Fast R-CNN framework by introducing multiple network “paths” to enhance feature extraction and object recognition robustness. The MultiPath architecture incorporates skip connections and multi-scale processing to capture both fine-grained details and high-level context within a single detection pipeline. This results in improved...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    X10

    Performance and Productivity at Scale

    X10 is a class-based, strongly-typed, garbage-collected, object-oriented language. To support concurrency and distribution, X10 uses the Asynchronous Partitioned Global Address Space programming model (APGAS). This model introduces two key concepts -- places and asynchronous tasks -- and a few mechanisms for coordination. With these, APGAS can express both regular and irregular parallelism, message-passing-style and active-message-style computations, fork-join and bulk-synchronous...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 21
    FastFlow: programming multi-core

    FastFlow: programming multi-core

    Pattern-based multi/many-core parallel programming framework

    FastFlow is a C/C++ programming framework supporting the development of pattern-based parallel programs on multi/many-core, GPUs and distributed platforms. FastFlow run-time is built upon non-blocking threads and lock-free queues. Thanks to its very efficient CAS-free communication/synchronization support (e.g. few clock cycles core-to-core latency), FastFlow effectively supports the exploitation of fine grain parallelism, e.g. parallel codes managing very high frequency streams on commodity multi-core. ...
    Leader badge
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    SPar: Stream Parallelism in Multi-Cores

    SPar: Stream Parallelism in Multi-Cores

    An Embedded C++ Domain-Specific Language

    SPar is an internal C++ Domain-Specific Language (DSL) suitable to model and implement classical stream parallel patterns. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. Latest version can be downloaded from the SVN using the following command: svn checkout svn://svn.code.sf.net/p/spar-dsl-compiler/svn/ spar
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    NOMAD is a C++ code that implements the MADS algorithm (Mesh Adaptive Direct Search) for difficult blackbox optimization problems. Such problems occur when the functions to optimize are costly computer simulations with no derivatives.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    Extended Memory Semantics (EMS)

    Extended Memory Semantics (EMS)

    Persistent shared object memory and parallelism for Node.js and Python

    EMS makes possible persistent shared memory parallelism between Node.js, Python, and C/C++. Extended Memory Semantics (EMS) unifies synchronization and storage primitives to address several challenges of parallel programming. A modern multi-core server has 16-32 cores and nearly 1TB of memory, equivalent to an entire rack of systems from a few years ago. As a consequence, jobs formerly requiring a Map-Reduce cluster can now be performed entirely in shared memory on a single server without using distributed programming.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    LightPCC

    Parallel pairwise correlation computation on Intel Xeon Phi clusters

    The first parallel and distributed library for pairwise correlation/dependence computation on Intel Xeon Phi clusters. This library is written in C++ template classes and achieves high speed by exploring the SIMD-instruction-level and thread-level parallelism within Xeon Phis as well as accelerator-level parallelism among multiple Xeon Phis. To facilitate balanced workload distribution, we have proposed a general framework for symmetric all-pairs computation by building provable bijective functions between job identifier and coordinate space for the first time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB