cuda gpu free download

Showing 192 open source projects for "cuda gpu"

View related business solutions

Auth0 B2B Essentials: SSO, MFA, and RBAC Built In
Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.

Sign Up Free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

GPU Puzzles

Solve puzzles. Learn CUDA

GPU Puzzles is an educational project designed to teach GPU programming concepts through interactive coding exercises and puzzles. Instead of presenting traditional lecture-style explanations, the project immerses learners directly in hands-on programming tasks that demonstrate how GPU computation works. The exercises are implemented using Python with the Numba CUDA interface, which allows Python code to compile into GPU kernels that run on CUDA-enabled hardware. ...

Downloads: 1 This Week

Last Update: 4 days ago
See Project
2

CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library

CV-CUDA is an open-source project that enables building efficient cloud-scale Artificial Intelligence (AI) imaging and computer vision (CV) applications. It uses graphics processing unit (GPU) acceleration to help developers build highly efficient pre- and post-processing pipelines. CV-CUDA originated as a collaborative effort between NVIDIA and ByteDance.

Downloads: 44 This Week

Last Update: 2025-11-15
See Project
3

CUDA.jl

CUDA programming in Julia

High-performance GPU programming in a high-level language. JuliaGPU is a GitHub organization created to unify the many packages for programming GPUs in Julia. With its high-level syntax and flexible compiler, Julia is well-positioned to productively program hardware accelerators like GPUs without sacrificing performance. The latest development version of CUDA.jl requires Julia 1.8 or higher. If you are using an older version of Julia, you need to use a previous version of CUDA.jl. This will...

Downloads: 3 This Week

Last Update: 1 day ago
See Project
4

CUDA Agent

Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDA Agent is a research-driven agentic reinforcement learning system designed to automatically generate and optimize high-performance CUDA kernels for GPU workloads. The project addresses the long-standing challenge that efficient CUDA programming typically requires deep hardware expertise by training an autonomous coding agent capable of iterative improvement through execution feedback.

Downloads: 0 This Week

Last Update: 2026-03-03
See Project
Powerful App Monitoring Without Surprise Bills
AppSignal starts at $23/month with all features included. No overages, no hidden fees. 30-day free trial.

Tired of monitoring tools that punish you for scaling? AppSignal offers transparent, predictable pricing with every feature unlocked on every plan. Track errors, monitor performance, detect anomalies, and manage logs across Ruby, Python, Node.js, and more. Trusted by developers since 2012 with free dev-to-dev support. No credit card required to start your 30-day trial.

Try AppSignal Free
5

how-to-optim-algorithm-in-cuda

How to optimize some algorithm in cuda

...These examples show how different optimization techniques influence performance on modern GPU hardware and allow readers to experiment with real implementations. The repository also contains extensive learning notes that summarize CUDA programming concepts, GPU architecture details, and performance engineering strategies.

Downloads: 0 This Week

Last Update: 4 days ago
See Project
6

CuPy

A NumPy-compatible array library accelerated by CUDA

CuPy is an open source implementation of NumPy-compatible multi-dimensional array accelerated with NVIDIA CUDA. It consists of cupy.ndarray, a core multi-dimensional array class and many functions on it. CuPy offers GPU accelerated computing with Python, using CUDA-related libraries to fully utilize the GPU architecture. According to benchmarks, it can even speed up some operations by more than 100X. CuPy is highly compatible with NumPy, serving as a drop-in replacement in most cases. ...

Downloads: 36 This Week

Last Update: 2026-02-20
See Project
7

NVIDIA GPU Operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes

...However, configuring and managing nodes with these hardware resources requires the configuration of multiple software components such as drivers, container runtimes or other libraries which are difficult and prone to errors. The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labeling, DCGM-based monitoring, and others.

Downloads: 0 This Week

Last Update: 2025-12-03
See Project
8

CUDA API Wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs

CUDA API Wrappers is a C++ library providing high-level, modern wrappers for NVIDIA’s CUDA runtime and driver APIs, enhancing usability and efficiency. It is intended for those who would otherwise use these APIs directly, to make working with them more intuitive and consistent, making use of modern C++ language capabilities, programming idioms, and best practices. In a nutshell - making CUDA API work more fun.

Downloads: 0 This Week

Last Update: 2026-02-09
See Project
9

XMRig

RandomX, KawPow, CryptoNight, AstroBWT and GhostRider unified miner

High performance, open-source, cross-platform RandomX, KawPow, CryptoNight, and AstroBWT CPU/GPU miner, RandomX benchmark, and stratum proxy. XMRig is a high-performance, open-source, cross-platform RandomX, KawPow, CryptoNight, and AstroBWT unified CPU/GPU miner and RandomX benchmark. Official binaries are available for Windows, Linux, macOS, and FreeBSD. The preferred way to configure the miner is the JSON config file as it is more flexible and human-friendly. The command-line interface...

1 Review

Downloads: 75 This Week

Last Update: 2025-12-23
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

ImplicitGlobalGrid.jl

Distributed parallelization of stencil-based GPU and CPU applications

...Samuel Omlin) with Stanford University (Dr. Ludovic Räss) and the Swiss Geocomputing Centre (Prof. Yuri Podladchikov). It renders the distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid almost trivial and enables close to ideal weak scaling of real-world applications on thousands of GPUs [1, 2, 3]. ImplicitGlobalGrid relies on the Julia MPI wrapper (MPI.jl) to perform halo updates close to hardware limit and leverages CUDA-aware or ROCm-aware MPI for GPU-applications. ...

Downloads: 0 This Week

Last Update: 2026-01-08
See Project
11

Tiny CUDA Neural Networks

Lightning fast C++/CUDA neural network framework

This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning-fast "fully fused" multi-layer perceptron (technical paper), a versatile multiresolution hash encoding (technical paper), as well as support for various other input encodings, losses, and optimizers. We provide a sample application where an image function (x,y) -> (R,G,B) is learned. The fully fused MLP component of this framework requires a very large amount of shared...

Downloads: 0 This Week

Last Update: 2025-07-08
See Project
12

Shumai

Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

...It can automatically leverage GPU acceleration on Linux (via CUDA) and CPU computation on macOS.

Downloads: 3 This Week

Last Update: 1 day ago
See Project
13

CUDA Containers for Edge AI & Robotics

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

CUDA Containers for Edge AI & Robotics is an open-source project that provides a modular container build system designed for running machine learning and AI workloads on NVIDIA Jetson devices. The repository contains container configurations that package the latest AI frameworks and dependencies optimized for Jetson hardware. These containers simplify the deployment of complex machine learning environments by bundling libraries such as CUDA, TensorRT, and deep learning frameworks into reproducible container images. ...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
14

KeyKiller-Cuda

Solving the Satoshi Puzzle

KeyKiller is a GPU-accelerated version of the KeyKiller project, designed to achieve extreme performance in solving Satoshi Nakamoto's puzzles using modern NVIDIA GPUs. KeyKiller CUDA pushes the limits of cryptographic key search performance by leveraging CUDA, thread-beam parallelism, and batch EC operations. The command-line version is open-source and free to use.

Downloads: 15 This Week

Last Update: 2025-12-13
See Project
15

Taichi

Productive, portable, and performant GPU programming in Python

Taichi is an open-source, embedded DSL within Python designed for high-performance numerical and physical simulations. It uses JIT compilation (via LLVM and its runtime TiRT) to offload compute-heavy code to CPUs, GPUs, mobile devices, and embedded systems. With built-in support for sparse data structures (SNode), automatic differentiation, AOT deployment, and compatibility with CUDA, Vulkan, Metal, and OpenGL ES, it empowers disciplines like simulation, graphics, AI, and robotics

Downloads: 2 This Week

Last Update: 2025-07-30
See Project
16

Model Zoo

Please do not feed the models

...Each model is organized into its own project folder with pinned package versions, ensuring reproducibility and stability. The examples serve both as educational tools for learning Flux and as practical starting points for building new models. GPU acceleration is supported for most models through CUDA integration, enabling efficient training on compatible hardware. With community contributions encouraged, the Model Zoo acts as a hub for sharing and exploring diverse machine learning applications in Julia.

Downloads: 3 This Week

Last Update: 1 hour ago
See Project
17

cuDF

GPU DataFrame Library

...The RAPIDS suite of open-source software libraries aims to enable the execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Downloads: 0 This Week

Last Update: 2026-02-05
See Project
18

rembg

Rembg is a tool to remove images background

Rembg is a powerful tool that utilizes AI (specifically U^2-Net) to automatically remove backgrounds from images, offering a streamlined command-line interface and Docker support. It's ideal for batch processing and integrates smoothly into workflows

Downloads: 28 This Week

Last Update: 2026-01-03
See Project
19

cuML

RAPIDS Machine Learning Library

cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.

Downloads: 4 This Week

Last Update: 2026-02-05
See Project
20

Halide

A language for fast, portable data-parallel computation

...It was designed to make writing high-performance image and array processing code much easier on modern machines. It works on all major operating systems and with several CPU architectures (X86, ARM, MIPS, Hexagon, PowerPC) and GPU Compute APIs (CUDA, OpenCL, OpenGL, among others). It isn't a standalone programming language however; rather it is embedded in C++ which means that you write C++ code, building an in-memory representation of a Halide pipeline using Halide's C++ API. This representation can then be compiled to an object file, or a JIT-compile and run in the same process. ...

Downloads: 3 This Week

Last Update: 2025-09-17
See Project
21

Nvitop

An interactive NVIDIA-GPU process viewer and beyond

nvitop is an interactive NVIDIA device and process monitoring tool. It has a colorful and informative interface that continuously updates the status of the devices and processes. As a resource monitor, it includes many features and options, such as tree-view, environment variable viewing, process filtering, process metrics monitoring, etc. Beyond that, the package also ships a CUDA device selection tool nvisel for deep learning researchers. It also provides handy APIs that allow developers...

Downloads: 0 This Week

Last Update: 2026-01-27
See Project
22

OpenCV

Open Source Computer Vision Library

OpenCV (Open Source Computer Vision Library) is a comprehensive open-source library for computer vision, machine learning, and image processing. It enables developers to build real-time vision applications ranging from facial recognition to object tracking. OpenCV supports a wide range of programming languages including C++, Python, and Java, and is optimized for both CPU and GPU operations.

Downloads: 9 This Week

Last Update: 2025-12-31
See Project
23

emgucv

Cross platform .Net wrapper to the OpenCV image processing library

Emgu CV is a cross platform .Net wrapper to the OpenCV image processing library. Allowing OpenCV functions to be called from .NET compatible languages. The wrapper can be compiled by Visual Studio and Unity, it can run on Windows, Linux, Mac OS, iOS and Android.

Downloads: 10 This Week

Last Update: 2025-10-09
See Project
24

NNlib.jl

Neural Network primitives with multiple backends

This package provides a library of functions useful for neural networks, such as softmax, sigmoid, batched multiplication, convolutions and pooling. Many of these are used by Flux.jl, which loads this package, but they may be used independently.

Downloads: 4 This Week

Last Update: 2026-01-13
See Project
25

ParallelStencil.jl

Package for writing high-level code for parallel stencil computations

ParallelStencil empowers domain scientists to write architecture-agnostic high-level code for parallel high-performance stencil computations on GPUs and CPUs. Performance similar to CUDA C / HIP can be achieved, which is typically a large improvement over the performance reached when using only CUDA.jl or AMDGPU.jl GPU Array programming. For example, a 2-D shallow ice solver presented at JuliaCon 2020 [1] achieved a nearly 20 times better performance than a corresponding GPU Array programming implementation; in absolute terms, it reached 70% of the theoretical upper performance bound of the used Nvidia P100 GPU, as defined by the effective throughput metric, T_eff. ...

Downloads: 0 This Week

Last Update: 2026-03-02
See Project