Speaker diarization pipeline fully in PyTorch, no ONNX required
Latent diffusion model for high-quality text-to-image generation
Efficient text-to-image model with enhanced quality and typography
Advanced MMDiT text-to-image model for high-quality visual generation
Latent text-to-image model for high-quality inpainting from prompts
Stable Diffusion v1.4 generates photorealistic images from text prompt
Text-to-image diffusion model for high-quality image generation
Advanced base model for high-quality text-to-image generation
Generates high-quality short videos from a single still image input
Code generation model trained on 80+ languages with FIM support
Flexible text-to-text transformer model for multilingual NLP tasks
T5-Small: Lightweight text-to-text transformer for NLP tasks
Transformer model for detecting tables in document images
RoBERTa model for English sentiment analysis on Twitter data
Metric monocular depth estimation (vision model)
Vision Transformer model fine-tuned for facial age classification
Transformer model for image classification with patch-based input.
Base Vision Transformer pretrained on ImageNet-21k at 224x224
Lightweight ViT-based model for accurate image matting tasks
Detects speech activity in audio using pyannote.audio 2.1 pipeline
Waifu Diffusion creates anime-style images from text prompts
Portuguese ASR model fine-tuned on XLSR-53 for 16kHz audio input
Russian ASR model fine-tuned on Common Voice and CSS10 datasets
Speaker embedding model for voice verification and identification
High-accuracy multilingual speech recognition and translation model