Multimodal Transformer for document image understanding and layout
CTC-based forced aligner for audio-text in 158 languages
Lightweight MobileNetV3 image classifier trained on ImageNet-1k
Efficient cross-encoder for MS MARCO passage re-ranking tasks
High-performance multilingual embedding model for 94 languages
ViT-based model for detecting NSFW images with high accuracy
AI art model fine-tuned on Midjourney-style prompts for unique visuals
Compact GPT-style language model for open text generation and research
Lightweight sentence embedding model for semantic search
Lightweight multilingual model for sentence similarity tasks
Multilingual sentence embeddings for search and similarity tasks
Small, high-performing language model for QA, chat, and code tasks
R1 1776 is an uncensored reasoning-focused LLM fine-tuned by Perplexiy
Lightweight ResNet-18 model trained on ImageNet with A1 recipe
Zero-shot image-text classification with ViT-B/32 encoder.
Robust BERT-based model for English with improved MLM training
Large MLM-based English model optimized from BERT architecture
SDXL-Turbo is a real-time text-to-image model for high-quality output
Speaker segmentation model for voice activity and overlap detection
Speaker segmentation model for 10s audio chunks with powerset labels
SigLIP: Zero-shot image-text model with shape-optimized ViT
Multilingual vision-language model for image-text understanding
Speaker diarization pipeline fully in PyTorch, no ONNX required
Latent diffusion model for high-quality text-to-image generation
Efficient text-to-image model with enhanced quality and typography