ERNIE-4.5-300B-A47B-Paddle download

ERNIE-4.5-300B-A47B-Paddle is a large-scale text-only Mixture-of-Experts (MoE) model built on Baidu’s ERNIE 4.5 architecture. With 300 billion total parameters and 47 billion activated per token, it is designed to handle complex natural language understanding and generation tasks. The model incorporates multimodal MoE pretraining infrastructure—although only the text modality is active in this version—leveraging innovations like modality-isolated routing, router orthogonal loss, and token-balanced optimization. It supports highly efficient deployment via PaddlePaddle, with quantization-ready configurations including 4-bit, 8-bit, and 2-bit variants for high-performance inference on large GPU clusters. Post-training techniques such as Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Unified Preference Optimization (UPO) have been applied for better alignment and response quality.

Features

300B total parameters with 47B active per token
Optimized for high-quality text generation and comprehension
Supports SFT, DPO, and UPO post-training strategies
Built using PaddlePaddle with FastDeploy support
Multiple quantization options (WINT4, WINT8, WINT2, FP8)
Compatible with vLLM and Hugging Face Transformers
Fine-tuning support through ERNIEKit with LoRA and multi-GPU options
Handles long-context inputs up to 131,072 tokens

Project Samples

Project Activity

See All Activity >

Follow ERNIE-4.5-300B-A47B-Paddle

ERNIE-4.5-300B-A47B-Paddle Web Site

Other Useful Business Software

MongoDB 8.0 on Atlas | Run anywhere

Now available in even more cloud regions across AWS, Azure, and Google Cloud.

MongoDB 8.0 brings enhanced performance and flexibility to Atlas—with expanded availability across 125+ regions globally. Build modern apps anywhere your users are, with the power of a modern database behind you.

Learn More

Rate This Project

User Reviews

Be the first to post a review of ERNIE-4.5-300B-A47B-Paddle!

Additional Project Details

Registered

2025-06-30

Similar Business Software

DeepSeek-V2

DeepSeek-V2 is a state-of-the-art Mixture-of-Experts (MoE) language model introduced by DeepSeek-AI, characterized by its economical training and efficient inference capabilities. With a total of 236 billion parameters, of which only 21 billion are active per token, it supports a context length...

See Software
DeepSeek-Coder-V2

DeepSeek-Coder-V2 is an open source code language model designed to excel in programming and mathematical reasoning tasks. It features a Mixture-of-Experts (MoE) architecture with 236 billion total parameters and 21 billion activated parameters per token, enabling efficient processing and high...

See Software
Kimi K2

Kimi K2 is a state-of-the-art open source large language model series built on a mixture-of-experts (MoE) architecture, featuring 1 trillion total parameters and 32 billion activated parameters for task-specific efficiency. Trained with the Muon optimizer on over 15.5 trillion tokens and...

See Software