Showing 92 open source projects for "gpu faster"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    Alacritty

    Alacritty

    A cross-platform, GPU-accelerated terminal emulator

    ...With such a strong focus on simplicity and performance, Alacritty’s included features are very carefully considered, ensuring that it remains blazingly fast. It’s got a GPU for rendering that makes a whole lot of optimizations possible. In various benchmarked terminals, Alacritty has shown to be either faster, or way faster than others. Alacritty requires no additional setup, but still allows configuration of many aspects of the terminal. It supports Windows, macOS, Linux and BSD.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 2
    FlashAttention

    FlashAttention

    Fast and memory-efficient exact attention

    FlashAttention is a high-performance deep learning optimization library that reimplements the attention mechanism used in transformer models to be significantly faster and more memory-efficient than standard implementations. It achieves this by using IO-aware algorithms that minimize memory reads and writes, reducing the quadratic memory overhead typically associated with attention operations. The project provides implementations of FlashAttention, FlashAttention-2, and newer iterations optimized for modern GPU architectures such as NVIDIA Hopper and AMD accelerators. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    DeepSeed

    DeepSeed

    Deep learning optimization library making distributed training easy

    ...Sparse attention of DeepSpeed powers an order-of-magnitude longer input sequence and obtains up to 6x faster execution comparing with dense transformers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Xenia Canary

    Xenia Canary

    Xbox 360 Emulator Research Project

    Xenia Canary is an experimental fork of the Xenia Xbox 360 emulator that moves faster than the mainline project to trial bleeding-edge improvements. It focuses on game compatibility and performance by iterating quickly on GPU and CPU emulation paths, shader translation, and timing correctness. Canary builds are where risky optimizations, new backends, and rewrites land first so they can be tested by a wider community before stabilizing.
    Downloads: 134 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    LuxTTS

    LuxTTS

    A high-quality rapid TTS voice cloning model

    ...Its design emphasizes efficiency and practicality, fitting within modest GPU memory footprints.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 6
    stt

    stt

    Voice Recognition to Text Tool

    ...It supports GPU acceleration if available, enabling faster processing on compatible hardware but still offers reliable performance on CPUs alone.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    HunyuanVideo

    HunyuanVideo

    HunyuanVideo: A Systematic Framework For Large Video Generation Model

    HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    TensorRT Node for ComfyUI

    TensorRT Node for ComfyUI

    Enables the best performance on NVIDIA RTX Graphics Cards

    ...This is particularly attractive for power users who run many generations or who host ComfyUI on dedicated hardware and want to squeeze out every bit of GPU performance. In short, it’s about taking ComfyUI from “it runs” to “it runs fast” on NVIDIA GPUs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Scalene

    Scalene

    High-performance CPU, GPU, and memory profiler for Python

    Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information. Once Scalene has profiled your program, it will launch a web browser with an interactive user interface (all processing is done locally).
    Downloads: 3 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    cuML

    cuML

    RAPIDS Machine Learning Library

    cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    ...Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu and Ali to complete text recognition locally. Support GPU acceleration, after GPU acceleration, you can get higher accuracy and faster extraction speed. (CLI version) No need for users to manually set the subtitle area, the project automatically detects the subtitle area through the text detection model. Filter the text in the non-subtitle area and remove the watermark (station logo) text.
    Downloads: 54 This Week
    Last Update:
    See Project
  • 12
    LLaMA-Factory

    LLaMA-Factory

    Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

    LLaMA-Factory is a fine-tuning and training framework for Meta's LLaMA language models. It enables researchers and developers to train and customize LLaMA models efficiently using advanced optimization techniques.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    pomegranate

    pomegranate

    Fast, flexible and easy to use probabilistic modelling in Python

    pomegranate is a library for probabilistic modeling defined by its modular implementation and treatment of all models as the probability distributions they are. The modular implementation allows one to easily drop normal distributions into a mixture model to create a Gaussian mixture model just as easily as dropping a gamma and a Poisson distribution into a mixture model to create a heterogeneous mixture. But that's not all! Because each model is treated as a probability distribution,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    DeepEP

    DeepEP

    DeepEP: an efficient expert-parallel communication library

    DeepEP is a communication library designed specifically to support Mixture-of-Experts (MoE) and expert parallelism (EP) deployments. Its core role is to implement high-throughput, low-latency all-to-all GPU communication kernels, which handle the dispatching of tokens to different experts (or shards) and then combining expert outputs back into the main data flow. Because MoE architectures require routing inputs to different experts, communication overhead can become a bottleneck — DeepEP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    CTranslate2

    CTranslate2

    Fast inference engine for Transformer models

    ...The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU. The execution is significantly faster and requires less resources than general-purpose deep learning frameworks on supported models and tasks thanks to many advanced optimizations: layer fusion, padding removal, batch reordering, in-place operations, caching mechanism, etc. The model serialization and computation support weights with reduced precision: 16-bit floating points (FP16), 16-bit integers (INT16), and 8-bit integers (INT8). ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 17
    Depth Pro

    Depth Pro

    Sharp Monocular Metric Depth in Less Than a Second

    Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. The...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Meridian

    Meridian

    Meridian is an MMM framework

    Meridian is a comprehensive, open source marketing mix modeling (MMM) framework developed by Google to help advertisers analyze and optimize the impact of their marketing investments. Built on Bayesian causal inference principles, Meridian enables organizations to evaluate how different marketing channels influence key performance indicators (KPIs) such as revenue or conversions while accounting for external factors like seasonality or economic trends. The framework provides a robust...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    CodeGeeX2

    CodeGeeX2

    CodeGeeX2: A More Powerful Multilingual Code Generation Model

    ...The model excels at code generation, translation, summarization, debugging, and comment generation, and it supports over 100 programming languages. With improved inference efficiency, quantization options, and multi-query/flash attention, CodeGeeX2 achieves faster generation speeds and lightweight deployment, requiring as little as 6GB GPU memory at INT4 precision. Its backend powers the CodeGeeX IDE plugins for VS Code, JetBrains, and other editors, offering developers interactive AI assistance with features like infilling and cross-file completion.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    XFrames

    XFrames

    GPU-accelerated GUI development for Node.js and the browser

    xframes is a high-performance library that empowers developers to build native desktop applications using familiar web technologies, specifically Node.js and React, without the overhead of the DOM. xframes serves as a streamlined alternative to Electron, designed for developers looking to maximize performance and efficiency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Insanely Fast Whisper

    Insanely Fast Whisper

    An opinionated CLI to transcribe Audio files w/ Whisper on-device

    ...It supports multiple Whisper model variants, including distilled versions for faster inference with minimal accuracy loss.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    ...It provides both a graphical user interface and API access, making it suitable for casual users as well as developers who want to integrate voice transformation into their own applications. The project also supports GPU acceleration, enabling faster inference and smoother real-time performance on compatible hardware. Additionally, it includes tools for training and managing voice models, giving users the ability to create personalized voice profiles.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 23
    OpenFold

    OpenFold

    Trainable, memory-efficient, and GPU-friendly PyTorch reproduction

    OpenFold carefully reproduces (almost) all of the features of the original open source inference code (v2.0.1). The sole exception is model ensembling, which fared poorly in DeepMind's own ablation testing and is being phased out in future DeepMind experiments. It is omitted here for the sake of reducing clutter. In cases where the Nature paper differs from the source, we always defer to the latter. OpenFold is trainable in full precision, half precision, or bfloat16 with or without...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Colossal-AI

    Colossal-AI

    Making large AI models cheaper, faster and more accessible

    The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    X-AnyLabeling

    X-AnyLabeling

    Effortless data labeling with AI support from Segment Anything

    X-AnyLabeling is an open-source data annotation platform designed to streamline the process of labeling datasets for computer vision and multimodal AI applications. The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks. It supports labeling tasks across images and videos and enables developers to prepare training datasets for...
    Downloads: 16 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB