27 projects for "cuda gpu" with 2 filters applied:

  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    GPU Puzzles

    GPU Puzzles

    Solve puzzles. Learn CUDA

    GPU Puzzles is an educational project designed to teach GPU programming concepts through interactive coding exercises and puzzles. Instead of presenting traditional lecture-style explanations, the project immerses learners directly in hands-on programming tasks that demonstrate how GPU computation works. The exercises are implemented using Python with the Numba CUDA interface, which allows Python code to compile into GPU kernels that run on CUDA-enabled hardware. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Numba CUDA Target

    Numba CUDA Target

    The CUDA target for Numba

    Numba CUDA Target is NVIDIA’s maintained CUDA backend for the Numba JIT compiler, enabling developers to write GPU-accelerated code directly in Python. It allows users to define CUDA kernels using Python syntax, which are then compiled into efficient GPU code at runtime using LLVM-based toolchains. This approach significantly lowers the barrier to entry for GPU programming by eliminating the need to write CUDA C++ while still delivering high performance. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    CUDA Agent

    CUDA Agent

    Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

    CUDA Agent is a research-driven agentic reinforcement learning system designed to automatically generate and optimize high-performance CUDA kernels for GPU workloads. The project addresses the long-standing challenge that efficient CUDA programming typically requires deep hardware expertise by training an autonomous coding agent capable of iterative improvement through execution feedback.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    how-to-optim-algorithm-in-cuda

    how-to-optim-algorithm-in-cuda

    How to optimize some algorithm in cuda

    ...These examples show how different optimization techniques influence performance on modern GPU hardware and allow readers to experiment with real implementations. The repository also contains extensive learning notes that summarize CUDA programming concepts, GPU architecture details, and performance engineering strategies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 5
    Numbast

    Numbast

    Build an automated pipeline that converts CUDA APIs into Numba

    Numbast is an automated toolchain that bridges CUDA C++ and Python by generating Numba-compatible bindings directly from CUDA header files. Its primary goal is to eliminate the manual effort required to expose CUDA libraries to Python, enabling developers to use GPU-accelerated functionality in Python environments more easily. The system parses CUDA C++ declarations and converts them into Python bindings that can be used within Numba, allowing seamless integration with Python-based GPU workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    CUDA Containers for Edge AI & Robotics

    CUDA Containers for Edge AI & Robotics

    Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

    CUDA Containers for Edge AI & Robotics is an open-source project that provides a modular container build system designed for running machine learning and AI workloads on NVIDIA Jetson devices. The repository contains container configurations that package the latest AI frameworks and dependencies optimized for Jetson hardware. These containers simplify the deployment of complex machine learning environments by bundling libraries such as CUDA, TensorRT, and deep learning frameworks into reproducible container images. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Model Zoo

    Model Zoo

    Please do not feed the models

    ...Each model is organized into its own project folder with pinned package versions, ensuring reproducibility and stability. The examples serve both as educational tools for learning Flux and as practical starting points for building new models. GPU acceleration is supported for most models through CUDA integration, enabling efficient training on compatible hardware. With community contributions encouraged, the Model Zoo acts as a hub for sharing and exploring diverse machine learning applications in Julia.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Koila

    Koila

    Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code

    ...It is particularly useful in environments where GPU resources are limited or where models frequently encounter CUDA memory errors.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    VibeTensor

    VibeTensor

    Our first fully AI generated deep learning system

    VibeTensor is a groundbreaking open-source research system software stack for deep learning that was uniquely generated almost entirely by AI coding agents under guided human supervision, demonstrating a new frontier in AI-assisted software engineering. It implements a PyTorch-style eager tensor library with a modern C++20 core that supports both CPU and CUDA backends, giving it the ability to manage tensors, automatic differentiation (autograd), and complex computation flows similar to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    SoniTranslate

    SoniTranslate

    Synchronized Translation for Videos

    SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 11
    Jupyter Docker Stacks

    Jupyter Docker Stacks

    Ready-to-run Docker images containing Jupyter applications

    Jupyter Docker Stacks provides a curated set of ready-to-run Docker container images that bundle Jupyter applications with popular data science and computing tools, enabling users to quickly start working in a reproducible environment. These stacks support a range of use cases, from lightweight base notebook images to full featured environments that include scientific computing libraries, machine learning tools, and IDE-like notebook interfaces, all within Docker containers that run...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 12
    Instant Neural Graphics Primitives

    Instant Neural Graphics Primitives

    Instant neural graphics primitives: lightning fast NeRF and more

    Instant Neural Graphics Primitives, is an open-source research project developed by NVIDIA that enables extremely fast training and rendering of neural graphics representations. The system implements several neural graphics primitives including neural radiance fields, signed distance functions, neural images, and neural volumes. These representations are trained using a compact neural network combined with a multiresolution hash encoding that dramatically accelerates both training and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Multimodal

    Multimodal

    TorchMultimodal is a PyTorch library

    This project, also known as TorchMultimodal, is a PyTorch library for building, training, and experimenting with multimodal, multi-task models at scale. The library provides modular building blocks such as encoders, fusion modules, loss functions, and transformations that support combining modalities (vision, text, audio, etc.) in unified architectures. It includes a collection of ready model classes—like ALBEF, CLIP, BLIP-2, COCA, FLAVA, MDETR, and Omnivore—that serve as reference...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Punica

    Punica

    Serving multiple LoRA finetuned LLM as one

    ...The system includes specialized CUDA kernels that enable batched GPU operations across different LoRA models simultaneously. This design allows a single GPU cluster to host many task-specific models while maintaining high throughput and minimal latency. The architecture also includes scheduling mechanisms that coordinate requests from multiple tenants and distribute workloads efficiently across available resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    CIRT

    CIRT - CUDA Interactive Ray Tracer

    CIRT is an implementation of PRTP (Programmable Ray Tracing Pipeline). Mainly it is to be used as a ray-tracing equivalent of OpenGL. It allows the user to implement various ray-tracing related algorithms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    QtAV

    QtAV

    A multimedia framework based on Qt and FFmpeg

    QtAV is a cross-platform and high performance multimedia playback framework based on Qt and FFmpeg. Features: timeline preview, gpu decoding etc
    Leader badge
    Downloads: 41 This Week
    Last Update:
    See Project
  • 17
    pipeless

    pipeless

    A computer vision framework to create and deploy apps in minutes

    Pipeless is an open-source computer vision framework to create and deploy applications without the complexity of building and maintaining multimedia pipelines. It ships everything you need to create and deploy efficient computer vision applications that work in real-time in just minutes. Pipeless is inspired by modern serverless technologies. It provides the development experience of serverless frameworks applied to computer vision. You provide some functions that are executed for new...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Detectron

    Detectron

    FAIR's research platform for object detection research

    Detectron is an object detection and instance segmentation research framework that popularized many modern detection models in a single, reproducible codebase. Built on Caffe2 with custom CUDA/C++ operators, it provided reference implementations for models like Faster R-CNN, Mask R-CNN, RetinaNet, and Feature Pyramid Networks. The framework emphasized a clean configuration system, strong baselines, and a “model zoo” so researchers could compare results under consistent settings. It includes training and evaluation pipelines that handle multi-GPU setups, standard datasets, and common augmentations, which helped standardize experimental practice in detection research. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    SIG Rust

    SIG Rust

    Rust language bindings for TensorFlow

    SIG Rust provides idiomatic Rust bindings for TensorFlow, making it possible for developers to work with TensorFlow functionality from within the Rust programming language. Rather than replacing TensorFlow itself, it acts as an integration layer that connects Rust applications to the TensorFlow C API. The repository is designed for developers who want Rust’s performance, safety, and systems programming strengths while still accessing TensorFlow’s machine learning capabilities. It includes...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    FasterTransformer

    FasterTransformer

    Transformer related optimization, including BERT, GPT

    FasterTransformer is a high-performance inference library designed to accelerate transformer-based models such as BERT, GPT, and T5 on NVIDIA GPUs. It provides optimized implementations of transformer encoder and decoder layers using CUDA, cuBLAS, and custom kernels to maximize throughput and minimize latency. The library supports multiple deep learning frameworks, including TensorFlow, PyTorch, and Triton, allowing developers to integrate it into existing pipelines without major changes. It...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Flux3D.jl

    Flux3D.jl

    3D computer vision library in Julia

    ...This package utilizes Flux.jl and Zygote.jl as its building blocks for training 3D vision models and for supporting differentiation. This package also have support of CUDA GPU acceleration with CUDA.jl.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    cphcttoolbox

    Cph CT Toolbox is a selection of Computed Tomography tools

    Copenhagen Computed Tomography Toolbox is a collection of applications and libraries for flexible and efficient CT reconstruction. The toolbox apps generally take a set of projections (X-ray intensity measurements) and filter and back project them in order to recreate the image or volume that the projections represent. The project includes both mostly informative CPU implementations and highly efficient GPU implementations. Regular releases are hosted at the Python Package Index.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    Accelerated Feature Extraction Tool

    A fast GPU accelerated feature extraction software for speech analysis

    A fast feature extraction software tool for speech analysis and processing. It incorporates standard MFCC, PLP, and TRAPS features. The tool is a specially designed to process very large audio data sets. It uses GPU acceleration if compatible GPU available (CUDA as weel as OpenCL, NVIDIA, AMD, and Intel GPUs are supported). CPU SSE intrinsic instruction set is used in cases where no compatible GPU present. The output files are stored in HTK format. The software is developed at Department of Cybernetics at University of West Bohemia in Pilsen.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    FreDec

    Parallelized FREquency DEComposer algorithm

    ...After selection of the initial frequency candidates, the algorithm passes through all their possible combinations and estimates their multi-frequency statistical significance. In the end, it prints out the set of largest frequency tuples that were still found significant. The GPU computing is implemented through CUDA and brings a significant performance increase. It is still possible to run FreDec solely on CPU, if no suitable GPU device is available in the system. See the details of the underlying theory in Baluev 2013, MNRAS, V. 436, P. 807 The description of the algorithm itself can be found in arXiv:1309.0100. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    game of life CUDA
    simple game of life use gpu cuda
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB