c++ parallelism free download

Showing 51 open source projects for "c++ parallelism"

View related business solutions

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
1

Distributed Llama

Connect home devices into a powerful cluster to accelerate LLM

Distributed Llama is an open-source project that enables users to connect multiple home devices into a powerful cluster to accelerate Large Language Model (LLM) inference. By leveraging tensor parallelism and high-speed synchronization over Ethernet, it allows for faster performance as more devices are added to the cluster. The system supports various operating systems, including Linux, macOS, and Windows, and is optimized for both ARM and x86_64 AVX2 CPUs.

Downloads: 1 This Week

Last Update: 2026-02-02
See Project
2

fairseq2

FAIR Sequence Modeling Toolkit 2

fairseq2 is a modern, modular sequence modeling framework developed by Meta AI Research as a complete redesign of the original fairseq library. Built from the ground up for scalability, composability, and research flexibility, fairseq2 supports a broad range of language, speech, and multimodal content generation tasks, including instruction fine-tuning, reinforcement learning from human feedback (RLHF), and large-scale multilingual modeling. Unlike the original fairseq—which evolved into a...

Downloads: 1 This Week

Last Update: 2025-11-07
See Project
3

SuiteSparse

The official SuiteSparse library: a suite of sparse matrix algorithms

The official SuiteSparse library: a suite of sparse matrix algorithms authored or co-authored by Tim Davis, Texas A&M University.

Downloads: 4 This Week

Last Update: 2026-02-10
See Project
4

UCCL

UCCL is an efficient communication library for GPUs

UCCL is a high-performance GPU communication library designed to support distributed machine learning workloads and large-scale AI systems. The library focuses on enabling efficient data transfer and collective communication between GPUs during training and inference processes. It supports a variety of communication patterns including collective operations such as all-reduce as well as peer-to-peer transfers that are commonly used in modern machine learning architectures. UCCL is designed to...

Downloads: 3 This Week

Last Update: 2 days ago
See Project
Cut Data Warehouse Costs by 54%
Easily migrate from Snowflake, Redshift, or Databricks with free tools.

BigQuery delivers 54% lower TCO with exabyte scale and flexible pricing. Free migration tools handle the SQL translation automatically.

Try Free
5

MyDumper

MyDumper project

MyDumper is a MySQL Logical Backup Tool. It has 2 tools. mydumper which is responsible to export a consistent backup of MySQL databases. myloader reads the backup from mydumper, connects the to destination database and imports the backup. Both tools use multithreading capabilities. MyDumper is Open Source and maintained by the community, it is not a Percona, MariaDB or MySQL product. Parallelism (hence, speed) and performance (avoids expensive character set conversion routines, efficient...

Downloads: 27 This Week

Last Update: 2026-02-03
See Project
6

tt-metal

TT-NN operator library, and TT-Metalium low level kernel programming

tt-metal, also referred to in its documentation as TT-Metalium, is Tenstorrent’s low-level software development kit for programming applications on Tenstorrent AI accelerators. The project is designed for developers who need direct access to the company’s Tensix processor architecture, exposing a programming model that is closer to hardware control than high-level inference frameworks. Instead of following a traditional GPU model centered on massive thread parallelism, the platform is built...

Downloads: 4 This Week

Last Update: 2 days ago
See Project
7

OneFlow

OneFlow is a deep learning framework designed to be user-friendly

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. An extension for OneFlow to target third-party compiler, such as XLA, TensorRT and OpenVINO etc.CUDA runtime is statically linked into OneFlow. OneFlow will work on a minimum supported driver, and any driver beyond. For more information. Distributed performance (efficiency) is the core technical difficulty of the deep learning framework. OneFlow focuses on performance improvement and heterogeneous...

Downloads: 0 This Week

Last Update: 2024-03-11
See Project
8

Puma

A Ruby/Rack web server built for concurrency

Unlike other Ruby Webservers, Puma was built for speed and parallelism. Puma is a small library that provides a very fast and concurrent HTTP 1.1 server for Ruby web applications. It is designed for running Rack apps only. What makes Puma so fast is the careful use of a Ragel extension to provide fast, accurate HTTP 1.1 protocol parsing. This makes the server scream without too many portability issues. If you are using Bundler, just add Puma to your project's Gemfile. Once you've installed...

Downloads: 1 This Week

Last Update: 2026-01-21
See Project
9

cuDF

GPU DataFrame Library

Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming. For additional examples, browse our complete API documentation, or check out our more detailed notebooks. cuDF can be installed...

Downloads: 0 This Week

Last Update: 2026-02-05
See Project
Catch Bugs Before Your Customers Do
Real-time error alerts, performance insights, and anomaly detection across your full stack. Free 30-day trial.

Move from alert to fix before users notice. AppSignal monitors errors, performance bottlenecks, host health, and uptime—all from one dashboard. Instant notifications on deployments, anomaly triggers for memory spikes or error surges, and seamless log management. Works out of the box with Rails, Django, Express, Phoenix, Next.js, and dozens more. Starts at $23/month with no hidden fees.

Try AppSignal Free
10

Vector Pascal Compiler

Vector Pascal is a language targeted at SIMD multi-core instruction-sets such as the AVX and SSE2 or x86-64-v3. It has a SIMD compiler which supports parallel vector operations, loop unrolling, common sub expression removal etc. It is implemented in Java.

1 Review

Downloads: 5 This Week

Last Update: 3 days ago
See Project
11

KeyKiller-Cuda

Solving the Satoshi Puzzle

KeyKiller is a GPU-accelerated version of the KeyKiller project, designed to achieve extreme performance in solving Satoshi Nakamoto's puzzles using modern NVIDIA GPUs. KeyKiller CUDA pushes the limits of cryptographic key search performance by leveraging CUDA, thread-beam parallelism, and batch EC operations. The command-line version is open-source and free to use. For the paid advanced graphics version, please visit: https://gitlab.com/8891689/KeyKiller-Cuda/

Downloads: 10 This Week

Last Update: 2025-12-13
See Project
12

BMDFM

Binary Modular DataFlow Machine (BMDFM)

Binary Modular DataFlow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program. The BMDFM dynamic scheduling subsystem performs a...

Downloads: 0 This Week

Last Update: 2025-06-13
See Project
13

JMMcG Core C++ Library.

This is a basic, low-level library with pretensions to implementing features above and beyond (but not necessarily better than!) those implemented within the Standard C++ Library and the Boost Library. In particular data-flow based parallelism and a FIX-to Exchange-Protocol message translator that is blindingly fast! The source code has moved to GitLab: https://gitlab.com/jmmcg/libjmmcg

Downloads: 0 This Week

Last Update: 2025-03-26
See Project
14

CUDA Pathtracer

GPU Raytracer from scratch in C++/CUDA

...It demonstrates the power of modern GPU architectures to handle complex lighting calculations, reflections, shadows, and global illumination in real-time. This project is educational and experimental, providing insight into GPU parallelism and real-time rendering techniques. Its clean and modular C++ structure makes it a great reference for students and graphics developers alike.

Downloads: 0 This Week

Last Update: 2025-03-25
See Project
15

Apache MXNet (incubating)

A flexible and efficient library for deep learning

Apache MXNet is an open source deep learning framework designed for efficient and flexible research prototyping and production. It contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations. On top of this is a graph optimization layer, overall making MXNet highly efficient yet still portable, lightweight and scalable.

Downloads: 2 This Week

Last Update: 2023-12-13
See Project
16

MXNet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning

Apache MXNet is a scalable, efficient open-source deep learning framework—offering a flexible hybrid programming model (symbolic + imperative) and supporting a wide array of languages—designed for training and deploying neural networks across heterogeneous systems. Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic...

Downloads: 0 This Week

Last Update: 2025-08-18
See Project
17

DRT

Dataflow Run Time

This software aims at demonstrating that we can easily provide a very small and powerful runtime for running programs that are coded in whatever programming model, but that could be *executed* in a DATAFLOW style. The Dataflow Run Time (DRT) provides the runtime support for that The first benefit of this software is to allow a rapid development of such programs in the context of the TERAFLUX project http://teraflux.eu The runtime API has been designed in such way to allow for a future...

Downloads: 0 This Week

Last Update: 2021-06-07
See Project
18

GHC (Glasgow Haskell Compiler)

Mirror of the Glasgow Haskell Compiler

GHC (Glasgow Haskell Compiler) is the leading open-source compiler and interactive environment for the Haskell programming language, supporting the Haskell 2010 standard plus numerous language extensions. It compiles to native machine code (via LLVM or C), and includes the interactive GHCi REPL. For full information on building GHC, see the GHC Building Guide. Here follows a summary - if you get into trouble, the Building Guide has all the answers. For building library documentation, you'll...

Downloads: 7 This Week

Last Update: 2025-09-04
See Project
19

MultiPathNet

A Torch implementation of the object detection network

MultiPathNet is a Torch-7 implementation of the “A MultiPath Network for Object Detection” paper (BMVC 2016), developed by Facebook AI Research. It extends the Fast R-CNN framework by introducing multiple network “paths” to enhance feature extraction and object recognition robustness. The MultiPath architecture incorporates skip connections and multi-scale processing to capture both fine-grained details and high-level context within a single detection pipeline. This results in improved...

Downloads: 0 This Week

Last Update: 4 days ago
See Project
20

X10

Performance and Productivity at Scale

X10 is a class-based, strongly-typed, garbage-collected, object-oriented language. To support concurrency and distribution, X10 uses the Asynchronous Partitioned Global Address Space programming model (APGAS). This model introduces two key concepts -- places and asynchronous tasks -- and a few mechanisms for coordination. With these, APGAS can express both regular and irregular parallelism, message-passing-style and active-message-style computations, fork-join and bulk-synchronous...

Downloads: 15 This Week

Last Update: 2019-01-07
See Project
21

FastFlow: programming multi-core

Pattern-based multi/many-core parallel programming framework

FastFlow is a C/C++ programming framework supporting the development of pattern-based parallel programs on multi/many-core, GPUs and distributed platforms. FastFlow run-time is built upon non-blocking threads and lock-free queues. Thanks to its very efficient CAS-free communication/synchronization support (e.g. few clock cycles core-to-core latency), FastFlow effectively supports the exploitation of fine grain parallelism, e.g. parallel codes managing very high frequency streams on commodity multi-core. ...

4 Reviews

Downloads: 0 This Week

Last Update: 2019-01-15
See Project
22

SPar: Stream Parallelism in Multi-Cores

An Embedded C++ Domain-Specific Language

SPar is an internal C++ Domain-Specific Language (DSL) suitable to model and implement classical stream parallel patterns. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. Latest version can be downloaded from the SVN using the following command: svn checkout svn://svn.code.sf.net/p/spar-dsl-compiler/svn/ spar

Downloads: 0 This Week

Last Update: 2018-04-01
See Project
23

NOMAD: blackbox optimization software

NOMAD is a C++ code that implements the MADS algorithm (Mesh Adaptive Direct Search) for difficult blackbox optimization problems. Such problems occur when the functions to optimize are costly computer simulations with no derivatives.

Downloads: 6 This Week

Last Update: 2018-06-26
See Project
24

Extended Memory Semantics (EMS)

Persistent shared object memory and parallelism for Node.js and Python

EMS makes possible persistent shared memory parallelism between Node.js, Python, and C/C++. Extended Memory Semantics (EMS) unifies synchronization and storage primitives to address several challenges of parallel programming. A modern multi-core server has 16-32 cores and nearly 1TB of memory, equivalent to an entire rack of systems from a few years ago. As a consequence, jobs formerly requiring a Map-Reduce cluster can now be performed entirely in shared memory on a single server without using distributed programming.

Downloads: 0 This Week

Last Update: 2023-10-24
See Project
25

LightPCC

Parallel pairwise correlation computation on Intel Xeon Phi clusters

The first parallel and distributed library for pairwise correlation/dependence computation on Intel Xeon Phi clusters. This library is written in C++ template classes and achieves high speed by exploring the SIMD-instruction-level and thread-level parallelism within Xeon Phis as well as accelerator-level parallelism among multiple Xeon Phis. To facilitate balanced workload distribution, we have proposed a general framework for symmetric all-pairs computation by building provable bijective functions between job identifier and coordinate space for the first time.

Downloads: 0 This Week

Last Update: 2017-04-05
See Project