Qwen2.5-VL
Qwen2.5-VL is the latest vision-language model from the Qwen series, representing a significant advancement over its predecessor, Qwen2-VL. This model excels in visual understanding, capable of recognizing a wide array of objects, including text, charts, icons, graphics, and layouts within images. It functions as a visual agent, capable of reasoning and dynamically directing tools, enabling applications such as computer and phone usage. Qwen2.5-VL can comprehend videos exceeding one hour in length and can pinpoint relevant segments within them. Additionally, it accurately localizes objects in images by generating bounding boxes or points and provides stable JSON outputs for coordinates and attributes. The model also supports structured outputs for data like scanned invoices, forms, and tables, benefiting sectors such as finance and commerce. Available in base and instruct versions across 3B, 7B, and 72B sizes, Qwen2.5-VL is accessible through platforms like Hugging Face and ModelScope.
Learn more
Qwen2
Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
Qwen2 is a series of large language models developed by the Qwen team at Alibaba Cloud. It includes both base language models and instruction-tuned models, ranging from 0.5 billion to 72 billion parameters, and features both dense models and a Mixture-of-Experts model. The Qwen2 series is designed to surpass most previous open-weight models, including its predecessor Qwen1.5, and to compete with proprietary models across a broad spectrum of benchmarks in language understanding, generation, multilingual capabilities, coding, mathematics, and reasoning.
Learn more
Qwen
Qwen is a powerful, free AI assistant built on the advanced Qwen model series, designed to help anyone with creativity, research, problem-solving, and everyday tasks. While Qwen Chat is the main interface for most users, Qwen itself powers a broad range of intelligent capabilities including image generation, deep research, website creation, advanced reasoning, and context-aware search. Its multimodal intelligence enables Qwen to understand and process text, images, audio, and video simultaneously for richer insights. Qwen is available on web, desktop, and mobile, ensuring seamless access across all devices. For developers, the Qwen API provides OpenAI-compatible endpoints, making integration simple and allowing Qwen’s intelligence to power apps, services, and automation. Whether you're chatting through Qwen Chat or building with the Qwen API, Qwen delivers fast, flexible, and highly capable AI support.
Learn more
Qwen3.5
Qwen3.5 is a next-generation open-weight multimodal large language model designed to power native vision-language agents. The flagship release, Qwen3.5-397B-A17B, combines a hybrid linear attention architecture with sparse mixture-of-experts, activating only 17 billion parameters per forward pass out of 397 billion total to maximize efficiency. It delivers strong benchmark performance across reasoning, coding, multilingual understanding, visual reasoning, and agent-based tasks. The model expands language support from 119 to 201 languages and dialects while introducing a 1M-token context window in its hosted version, Qwen3.5-Plus. Built for multimodal tasks, it processes text, images, and video with advanced spatial reasoning and tool integration. Qwen3.5 also incorporates scalable reinforcement learning environments to improve general agent capabilities. Designed for developers and enterprises, it enables efficient, tool-augmented, multimodal AI workflows.
Learn more