ERNIE-4.5-VL-28B-A3B-Paddle is a multimodal MoE chat model designed for complex image-text tasks, featuring 28 billion total parameters with 3 billion activated per token. Built on PaddlePaddle, it excels in tasks like visual question answering, description generation, and multimodal reasoning. It employs a heterogeneous Mixture-of-Experts architecture that supports both thinking and non-thinking inference modes. The model benefits from advanced pretraining and posttraining strategies, including Reinforcement Learning with Verifiable Rewards (RLVR), to enhance alignment and performance. Fine-tuned for real-world applications, it integrates language and vision through supervised learning, DPO, and UPO techniques. It supports long contexts up to 131,072 tokens and can be deployed using FastDeploy or the Hugging Face Transformers library. This version is ideal for developers needing high-performance, scalable multimodal capabilities in chat or image-based reasoning systems.
Features
- 28B parameter multimodal MoE with 3B active per token
- Handles image-text chat, reasoning, and description tasks
- Supports thinking and non-thinking inference modes
- Uses RLVR, SFT, DPO, and UPO for robust posttraining
- PaddlePaddle-based for optimized performance and deployment
- FastDeploy-ready with GPU-efficient quantization support
- Long context support up to 131,072 tokens
- Transformers-compatible with Python inference examples Preguntar a ChatGPT