ERNIE-4.5-21B-A3B-PT download

ERNIE-4.5-21B-A3B-PT is Baidu’s post-trained Mixture-of-Experts (MoE) large language model optimized for text understanding and generation. With 21 billion total parameters and 3 billion active per token, it delivers high efficiency in both performance and resource usage. The model was trained using a multimodal pre-training setup, but this version focuses solely on text and is tailored for post-training inference. It supports advanced fine-tuning strategies like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Unified Preference Optimization (UPO). ERNIE-4.5 uses a modular MoE architecture with 64 text experts, of which 6 are activated per token, and offers extremely long context length support (up to 131,072 tokens). It integrates with Hugging Face Transformers and PaddlePaddle, and is compatible with ERNIEKit and FastDeploy for streamlined training and deployment. This model is also being adapted for use with vLLM for faster inference.

Features

21B parameters with 3B active per token
Optimized for post-training language tasks
Long context length up to 131,072 tokens
64 text experts with 6 active per token
Supports SFT, DPO, UPO fine-tuning methods
Compatible with PaddlePaddle, Hugging Face Transformers
Designed for fast inference via FastDeploy and vLLM
Apache 2.0 license for commercial use

Project Samples

Project Activity

See All Activity >

Follow ERNIE-4.5-21B-A3B-PT

ERNIE-4.5-21B-A3B-PT Web Site

Other Useful Business Software

No-Nonsense Code-to-Cloud Security for Devs | Aikido

Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.

Start for Free

Rate This Project

User Reviews

Be the first to post a review of ERNIE-4.5-21B-A3B-PT!

Additional Project Details

Registered

2025-06-30

Similar Business Software

DeepSeek-Coder-V2

DeepSeek-Coder-V2 is an open source code language model designed to excel in programming and mathematical reasoning tasks. It features a Mixture-of-Experts (MoE) architecture with 236 billion total parameters and 21 billion activated parameters per token, enabling efficient processing and high...

See Software
DeepSeek-V2

DeepSeek-V2 is a state-of-the-art Mixture-of-Experts (MoE) language model introduced by DeepSeek-AI, characterized by its economical training and efficient inference capabilities. With a total of 236 billion parameters, of which only 21 billion are active per token, it supports a context length...

See Software
Kimi K2

Kimi K2 is a state-of-the-art open source large language model series built on a mixture-of-experts (MoE) architecture, featuring 1 trillion total parameters and 32 billion activated parameters for task-specific efficiency. Trained with the Muon optimizer on over 15.5 trillion tokens and...

See Software