ERNIE-4.5-21B-A3B-Base-PT download

ERNIE-4.5-21B-A3B-Base-PT is a post-trained text-only Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 21 billion total parameters and 3 billion activated per token. It is designed to excel in general-purpose language understanding and generation, refined through post-training techniques like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Unified Preference Optimization (UPO). The model benefits from a staged pretraining process focused on building deep language capabilities before integrating multimodal elements. Although it originates from a joint multimodal training pipeline, this variant isolates only the text components for focused performance and easier deployment. It is compatible with the Transformers library and supports long-context processing up to 131,072 tokens. The model also integrates smoothly with PaddlePaddle, FastDeploy, and vLLM inference, enabling scalable deployment across various platforms.

Features

21B parameters with 3B active per token using MoE architecture
Post-trained for text generation using SFT, DPO, and UPO
Supports long contexts up to 131,072 tokens
Fine-tuning ready with ERNIEKit (LoRA, multi-GPU, DPO)
Compatible with Hugging Face Transformers and vLLM
High inference efficiency via quantization and load balancing
Staged training enhances stability and language depth
Deployable with FastDeploy for scalable service integration

Project Samples

Project Activity

See All Activity >

Follow ERNIE-4.5-21B-A3B-Base-PT

ERNIE-4.5-21B-A3B-Base-PT Web Site

Other Useful Business Software

Picsart Enterprise Background Removal API for Stunning eCommerce Visuals

Instantly remove the background from your images in just one click.

With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.

Learn More

Rate This Project

User Reviews

Be the first to post a review of ERNIE-4.5-21B-A3B-Base-PT!

Additional Project Details

Registered

2025-06-30

Similar Business Software

DeepSeek-V2

DeepSeek-V2 is a state-of-the-art Mixture-of-Experts (MoE) language model introduced by DeepSeek-AI, characterized by its economical training and efficient inference capabilities. With a total of 236 billion parameters, of which only 21 billion are active per token, it supports a context length...

See Software
Qwen2.5-Max

Qwen2.5-Max is a large-scale Mixture-of-Experts (MoE) model developed by the Qwen team, pretrained on over 20 trillion tokens and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). In evaluations, it outperforms models like DeepSeek V3 in...

See Software
DeepSeek-Coder-V2

DeepSeek-Coder-V2 is an open source code language model designed to excel in programming and mathematical reasoning tasks. It features a Mixture-of-Experts (MoE) architecture with 236 billion total parameters and 21 billion activated parameters per token, enabling efficient processing and high...

See Software