ERNIE-4.5-21B-A3B-Paddle is a post-trained Mixture-of-Experts (MoE) language model from Baidu, designed for high-performance generation and understanding tasks. With 21 billion total parameters and 3 billion activated per token, it is optimized for large-scale inference using the PaddlePaddle framework. The model architecture supports efficient training and inference through advanced routing strategies, FP8 mixed-precision training, expert parallelism, and quantization. While primarily text-based, the architecture also includes vision experts for broader applicability, though this version focuses on text. ERNIE-4.5 incorporates fine-tuning methods like SFT, DPO, and UPO for performance and alignment with user preferences. It supports long context windows up to 131,072 tokens and integrates with ERNIEKit for streamlined fine-tuning. Deployment is supported via FastDeploy and is being adapted for vLLM and Hugging Face Transformers.
Features
- 21B parameters with 3B active per token
- Text modality with extended context (131,072 tokens)
- PaddlePaddle-optimized for efficient deployment
- Supports SFT, DPO, UPO fine-tuning via ERNIEKit
- Compatible with FastDeploy and Transformers
- Expert routing and quantization for performance
- Hybrid architecture includes vision expert stubs
- Designed for scalable inference across GPU setups