TurboTransformers

TurboTransformers is a high-performance inference framework optimized for running Transformer models efficiently on CPUs and GPUs. It improves latency and throughput for NLP applications.

Features

Optimized for low-latency Transformer model inference
Supports both CPU and GPU acceleration
Works with popular Transformer models like BERT and GPT
Implements kernel fusion for performance optimization
Compatible with PyTorch and TensorFlow
Provides quantization support for lower memory usage

Project Samples

Project Activity

See All Activity >

License

BSD License

Follow TurboTransformers

TurboTransformers Web Site

Other Useful Business Software

Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now

Rate This Project

User Reviews

Be the first to post a review of TurboTransformers!

Additional Project Details

Operating Systems

Linux, Windows

Programming Language

C++

Related Categories

C++ Natural Language Processing (NLP) Tool, C++ LLM Inference Tool

Registered

2025-01-23

Similar Business Software

NVIDIA TensorRT

NVIDIA TensorRT is an ecosystem of APIs for high-performance deep learning inference, encompassing an inference runtime and model optimizations that deliver low latency and high throughput for production applications. Built on the CUDA parallel programming model, TensorRT optimizes neural...

See Software
FriendliAI

FriendliAI is a generative AI infrastructure platform that offers fast, efficient, and reliable inference solutions for production environments. It provides a suite of tools and services designed to optimize the deployment and serving of large language models (LLMs) and other generative AI...

See Software
NVIDIA Triton Inference Server

NVIDIA Triton™ inference server delivers fast and scalable AI in production. Open-source inference serving software, Triton inference server streamlines AI inference by enabling teams deploy trained AI models from any framework (TensorFlow, NVIDIA TensorRT®, PyTorch, ONNX, XGBoost, Python,...

See Software

Report inappropriate content

TurboTransformers

Fast and user-friendly runtime for transformer inference

Get an email when there's a new version of TurboTransformers

Features

Project Samples

Project Activity

Categories

License

Follow TurboTransformers

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered