The Triton Inference Server provides an optimized cloud
Replace OpenAI GPT with another LLM in your app
Official inference library for Mistral models
Large Language Model Text Generation Inference
Serve machine learning models within a Docker container
Library for serving Transformers models on Amazon SageMaker
A real time inference engine for temporal logical specifications
Port of OpenAI's Whisper model in C/C++
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Run Local LLMs on Any Device. Open-source
Agentic, Reasoning, and Coding (ARC) foundation models
A high-throughput and memory-efficient inference and serving engine
Everything you need to build state-of-the-art foundation models
Port of Facebook's LLaMA model in C/C++
Wan2.2: Open and Advanced Large-Scale Video Generative Model
RGBD video generation model conditioned on camera input
C++ library for high performance inference on NVIDIA GPUs
Qwen3 is the large language model series developed by Qwen team
waifu2x converter ncnn version, run fast GPU with vulkan
High-performance neural network inference framework for mobile
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Ready-to-use OCR with 80+ supported languages
ONNX Runtime: cross-platform, high performance ML inferencing
User-friendly AI Interface
Self-hosted, community-driven, local OpenAI compatible API