ERNIE-4.5-VL-28B-A3B-PT download

ERNIE-4.5-VL-28B-A3B-PT is a multimodal Mixture-of-Experts (MoE) model from Baidu, designed for sophisticated vision-language reasoning and generation. With 28 billion parameters (3 billion activated per token), it enables high-quality image-text interactions, supporting tasks like visual Q&A, description, and multimodal chain-of-thought. The model uses a heterogeneous MoE architecture with isolated routing and token-balanced training for optimized cross-modal representation. It features post-training enhancements through Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Unified Preference Optimization (UPO), along with Reinforcement Learning with Verifiable Rewards (RLVR). Built on PaddlePaddle and compatible with the Transformers library, it supports both thinking and non-thinking inference modes. It handles long contexts (up to 131,072 tokens) and is designed to scale across various hardware.

Features

28B total parameters with 3B activated per token
Text and vision modality support with MoE routing
Enables multimodal reasoning and chain-of-thought
Includes RLVR, SFT, DPO, and UPO for alignment
Transformers-compatible for easy deployment
PaddlePaddle backend for high performance
Supports thinking and non-thinking inference modes
Long context handling up to 131,072 tokens

Project Samples

Project Activity

See All Activity >

Follow ERNIE-4.5-VL-28B-A3B-PT

ERNIE-4.5-VL-28B-A3B-PT Web Site

Other Useful Business Software

Picsart Enterprise Background Removal API for Stunning eCommerce Visuals

Instantly remove the background from your images in just one click.

With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.

Learn More

Rate This Project

User Reviews

Be the first to post a review of ERNIE-4.5-VL-28B-A3B-PT!

Additional Project Details

Registered

2025-06-30

Similar Business Software

DeepSeek-V2

DeepSeek-V2 is a state-of-the-art Mixture-of-Experts (MoE) language model introduced by DeepSeek-AI, characterized by its economical training and efficient inference capabilities. With a total of 236 billion parameters, of which only 21 billion are active per token, it supports a context length...

See Software
DeepSeek-Coder-V2

DeepSeek-Coder-V2 is an open source code language model designed to excel in programming and mathematical reasoning tasks. It features a Mixture-of-Experts (MoE) architecture with 236 billion total parameters and 21 billion activated parameters per token, enabling efficient processing and high...

See Software
Kimi K2

Kimi K2 is a state-of-the-art open source large language model series built on a mixture-of-experts (MoE) architecture, featuring 1 trillion total parameters and 32 billion activated parameters for task-specific efficiency. Trained with the Muon optimizer on over 15.5 trillion tokens and...

See Software