Build Vision Agents quickly with any model or video provider
ExDARK dataset is the largest collection of low-light images
Phi-3.5 for Mac: Locally-run Vision and Language Models
Low-Rank and Sparse Tools for Background Modeling and Subtraction
Moonshot's most powerful AI model
Open Source Differentiable Computer Vision Library
"Big Model" trains a visual multimodal VLM with 26M parameters
Multilingual Document Layout Parsing in a Single Vision-Language Model
Automatically find issues in image datasets
Low-latency AI inference engine optimized for mobile devices
A Pragmatic VLA Foundation Model
Vision AI browser agent for automation, testing, and extraction
Turn WiFi signals into real-time human sensing and spatial awareness.
Deep learning library
Optimism is Ethereum, scaled
Cosmos-RL is a flexible and scalable Reinforcement Learning framework
Capable of understanding text, audio, vision, video
NeurIPS2025 Spotlight] Quantized Attention
CoreNet: A library for training deep neural networks
High-performance Inference and Deployment Toolkit for LLMs and VLMs
A blazing fast AI Gateway with integrated guardrails
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official inference repo for FLUX.2 models
Mobile manipulation research tools for roboticists
Easy-to-use Speech Toolkit including Self-Supervised Learning model