ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle is a 300B-parameter Mixture-of-Experts (MoE) language model by Baidu, optimized with 4-bit weights and 8-bit activations for highly efficient inference. This quantized variant significantly reduces memory requirements while preserving output quality, enabling deployment on systems with limited GPU capacity. The model activates 47 billion parameters per token and is trained for high-performance text generation, supporting both Chinese and English. It leverages PaddlePaddle with TP4 (tensor parallelism across 4 GPUs), fine-grained scheduling, and expert parallelism for scalable, modular performance. The model includes long context support up to 131,072 tokens and integrates easily with FastDeploy for real-time applications. Like other ERNIE 4.5 variants, it was trained using supervised fine-tuning (SFT), DPO, and UPO to align with complex reasoning and generative tasks.
Features
- 4-bit weights and 8-bit activations for optimized efficiency
- 300B parameters with 47B active per token
- Tensor parallelism across 4 GPUs (TP4 configuration)
- Built on PaddlePaddle with FastDeploy support
- Context window up to 131,072 tokens
- Multilingual support (English and Chinese)
- Pretrained and post-trained for advanced language generation
- Open-source under Apache 2.0 license