Visual intelligence for your home.
Benchmark LLMs by fighting in Street Fighter 3
"VideoRAG: Chat with Your Videos
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Generate short videos with one click using AI LLM
Generate blog articles from video or audio
text and image to video generation: CogVideoX (2024) and CogVideo
Moonshot's most powerful AI model
All-in-one WebUI for AI generative image and video creation
Search all of YouTube from the command line
Capable of understanding text, audio, vision, video
The media player for language learning, with dual subtitles
GPT4V-level open-source multi-modal model based on Llama3-8B
Qwen3-omni is a natively end-to-end, omni-modal LLM
Lightweight Python library for adding real-time multi-object tracking
Workflow and speech recognition app
From nobody to big model (LLM) hero
Code and models for ICML 2024 paper, NExT-GPT
Secure open source cloud runtime for AI apps & AI agents
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Data Infrastructure providing an approach to multimodal AI workloads
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Build multimodal language agents for fast prototype and production
Adversarial Robustness Toolbox (ART) - Python Library for ML security
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA