The Triton Inference Server provides an optimized cloud
Official inference library for Mistral models
Replace OpenAI GPT with another LLM in your app
Large Language Model Text Generation Inference
Serve machine learning models within a Docker container
Library for serving Transformers models on Amazon SageMaker
A real time inference engine for temporal logical specifications
Port of OpenAI's Whisper model in C/C++
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Run Local LLMs on Any Device. Open-source
Agentic, Reasoning, and Coding (ARC) foundation models
Port of Facebook's LLaMA model in C/C++
RGBD video generation model conditioned on camera input
Wan2.2: Open and Advanced Large-Scale Video Generative Model
ONNX Runtime: cross-platform, high performance ML inferencing
C++ library for high performance inference on NVIDIA GPUs
A high-throughput and memory-efficient inference and serving engine
High-performance neural network inference framework for mobile
Qwen3 is the large language model series developed by Qwen team
Optimizing inference proxy for LLMs
Everything you need to build state-of-the-art foundation models
User-friendly AI Interface
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Ready-to-use OCR with 80+ supported languages
Self-hosted, community-driven, local OpenAI compatible API