Page 3 | Best Open Source Mac AI Models 2025

AI Models for Mac

View 109 business solutions

AI Models Mac Clear Filters

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
MongoDB 8.0 on Atlas | Run anywhere
Now available in even more cloud regions across AWS, Azure, and Google Cloud.

MongoDB 8.0 brings enhanced performance and flexibility to Atlas—with expanded availability across 125+ regions globally. Build modern apps anywhere your users are, with the power of a modern database behind you.

Learn More
1

DeepSeek-V3-0324

Advanced multilingual LLM with enhanced reasoning and code generation

DeepSeek-V3-0324 is a powerful large language model by DeepSeek AI that significantly enhances performance over its predecessor, especially in reasoning, programming, and Chinese language tasks. It achieves major benchmark improvements, such as +5.3 on MMLU-Pro and +19.8 on AIME, and delivers more executable, aesthetically improved front-end code. Its Chinese writing and search-answering capabilities have also been refined, generating more fluent, contextually aware long-form outputs. Key upgrades include better multi-turn interactions, function calling accuracy, translation quality, and support for structured outputs like JSON. The model is optimized to run at a system temperature of 0.3 for coherent, deterministic responses, even if API users specify higher temperatures. It offers structured prompt templates for tasks involving file input and web search, with advanced citation formatting in both English and Chinese.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
2

Devstral

Agentic 24B LLM optimized for coding tasks with 128k context support

Devstral-Small-2505 is a 23.6B parameter language model fine-tuned by Mistral AI and All Hands AI, built specifically for agentic software engineering tasks. Based on Mistral-Small-3.1, it supports a 128k context window and excels in exploring codebases, editing multiple files, and tool usage. The model achieves state-of-the-art open-source performance on SWE-Bench Verified with a score of 46.8%, surpassing much larger models. Devstral is designed for local and production-level deployments, compatible with frameworks like vLLM, Transformers, llama.cpp, and Ollama. It is licensed under Apache 2.0 and is fully open for commercial and non-commercial use. Its Tekken tokenizer allows a 131k vocabulary size for high flexibility in programming languages and natural language inputs. Devstral is the preferred backend for OpenHands, where it acts as the reasoning engine for autonomous code agents.

Downloads: 0 This Week

Last Update: 2025-07-14
See Project
3

Dia-1.6B

Dia-1.6B generates lifelike English dialogue and vocal expressions

Dia-1.6B is a 1.6 billion parameter text-to-speech model by Nari Labs that generates high-fidelity dialogue directly from transcripts. Designed for realistic vocal performance, Dia supports expressive features like emotion, tone control, and non-verbal cues such as laughter, coughing, or sighs. The model accepts speaker conditioning through audio prompts, allowing limited voice cloning and speaker consistency across generations. It is optimized for English and built for real-time performance on enterprise GPUs, though CPU and quantized versions are planned. The format supports [S1]/[S2] tags to differentiate speakers and integrates easily into Python workflows. While not tuned to a specific voice, user-provided audio can guide output style. Licensed under Apache 2.0, Dia is intended for research and educational use, with explicit restrictions on misuse like identity mimicry or deceptive content.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
4

ERNIE-4.5-0.3B-Base-PT

Compact 360M text model with high efficiency and fine-tuning support

ERNIE-4.5-0.3B-Base-PT is a compact, fully dense transformer model with 360 million parameters, optimized for general-purpose text generation tasks. It belongs to the ERNIE 4.5 series by Baidu and leverages advanced pretraining techniques without relying on a Mixture-of-Experts (MoE) structure. The model features 18 transformer layers, 16 attention heads, and a maximum context length of 131,072 tokens, offering strong language understanding for its size. It can be fine-tuned using ERNIEKit with support for SFT, LoRA, and DPO training methods, making it highly adaptable. Compatible with the Hugging Face Transformers library, the model can be easily used in Python for inference or deployed via FastDeploy. This variant emphasizes portability and accessibility, enabling fast deployment even on less powerful hardware. Ideal for developers seeking a smaller model for prototyping, educational use, or lightweight production tasks.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
Powering the best of the internet | Fastly
Fastly's edge cloud platform delivers faster, safer, and more scalable sites and apps to customers.

Ensure your websites, applications and services can effortlessly handle the demands of your users with Fastly. Fastly’s portfolio is designed to be highly performant, personalized and secure while seamlessly scaling to support your growth.

Try for free
5

ERNIE-4.5-0.3B-Base-Paddle

Lightweight 361M dense model for text generation and pretraining tasks

ERNIE-4.5-0.3B-Base-Paddle is Baidu’s compact, dense language model with 361 million parameters, pre-trained for general-purpose text generation. Unlike its MoE counterparts, this base model is fully dense and suitable for resource-constrained environments. It retains the architectural innovations of the ERNIE 4.5 series, including a deep 18-layer transformer with 16 attention heads and a 131,072 token context length. The model is built using PaddlePaddle and supports fine-tuning via ERNIEKit, with configuration files available for SFT, LoRA, and DPO methods. It can be easily deployed through FastDeploy or integrated into Hugging Face Transformers pipelines for practical applications. Despite its smaller size, it benefits from ERNIE’s efficient training infrastructure and token-balancing techniques. This model is ideal for fast experimentation, lightweight deployment, or educational use.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
6

ERNIE-4.5-0.3B-PT

Compact post-trained LLM for text generation using Transformers

ERNIE-4.5-0.3B-PT is a 360 million parameter dense language model by Baidu, post-trained to enhance performance on general-purpose natural language tasks. As part of the ERNIE 4.5 series, it emphasizes compactness and accessibility while maintaining strong capabilities for both English and Chinese text generation. The model features 18 transformer layers, 16 attention heads, and a remarkably long context window of 131,072 tokens. Optimized for use with the Hugging Face Transformers library, it supports seamless inference and fine-tuning, including SFT, DPO, and LoRA methods via ERNIEKit. It is fully compatible with PyTorch and includes support for vLLM-based deployment. Though smaller in size, it benefits from ERNIE's large-scale training infrastructure and multimodal innovations. ERNIE-4.5-0.3B-PT is ideal for developers and researchers seeking a lightweight, open-access LLM for dialogue systems and general text generation tasks.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
7

ERNIE-4.5-0.3B-Paddle

Small post-trained text model with PaddlePaddle optimization

ERNIE-4.5-0.3B-Paddle is a compact 360 million parameter dense transformer model, post-trained for efficient general-purpose text generation in English and Chinese. Developed by Baidu as part of the ERNIE 4.5 series, it is designed for lightweight applications while maintaining strong language modeling capabilities. The model comprises 18 layers, 16 attention heads, and supports an extended context length of up to 131,072 tokens. It is optimized specifically for PaddlePaddle and integrates seamlessly with the ERNIEKit toolkit for training methods such as SFT, DPO, and LoRA. Inference can be rapidly deployed using FastDeploy or via the Transformers library with remote code trust enabled. Though compact, the model inherits architecture-level optimizations from larger ERNIE models, including efficient memory use and inference strategies. This model is well-suited for users working in the PaddlePaddle ecosystem who require a performant and accessible LLM for scalable tasks.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
8

ERNIE-4.5-21B-A3B-Base-PT

Text-only ERNIE 4.5 MoE model post-trained for language tasks

ERNIE-4.5-21B-A3B-Base-PT is a post-trained text-only Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series, featuring 21 billion total parameters and 3 billion activated per token. It is designed to excel in general-purpose language understanding and generation, refined through post-training techniques like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Unified Preference Optimization (UPO). The model benefits from a staged pretraining process focused on building deep language capabilities before integrating multimodal elements. Although it originates from a joint multimodal training pipeline, this variant isolates only the text components for focused performance and easier deployment. It is compatible with the Transformers library and supports long-context processing up to 131,072 tokens. The model also integrates smoothly with PaddlePaddle, FastDeploy, and vLLM inference, enabling scalable deployment across various platforms.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
9

ERNIE-4.5-21B-A3B-Base-Paddle

21B-parameter text MoE model for powerful multilingual generation

ERNIE-4.5-21B-A3B-Base-Paddle is a powerful text-focused Mixture-of-Experts (MoE) model developed by Baidu with 21 billion total parameters and 3 billion activated per token. It is pretrained using a staged approach that first builds strong language understanding and long-text capabilities before integrating broader modality alignment. While vision components were used in joint training, this specific variant extracts only the text-related parameters, making it a lightweight but capable base model for natural language tasks. Its MoE architecture includes 64 text and 64 vision experts, with six of each activated, supported by shared experts for better generalization. Built on PaddlePaddle, the model benefits from memory-efficient pipeline scheduling, FP8 mixed-precision training, and advanced quantization strategies. It supports long contexts of up to 131,072 tokens, making it suitable for tasks requiring deep document understanding.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
Picsart Enterprise Background Removal API for Stunning eCommerce Visuals
Instantly remove the background from your images in just one click.

With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.

Learn More
10

ERNIE-4.5-21B-A3B-PT

21B parameter text-only MoE model by Baidu, fine-tuned for reasoning

ERNIE-4.5-21B-A3B-PT is Baidu’s post-trained Mixture-of-Experts (MoE) large language model optimized for text understanding and generation. With 21 billion total parameters and 3 billion active per token, it delivers high efficiency in both performance and resource usage. The model was trained using a multimodal pre-training setup, but this version focuses solely on text and is tailored for post-training inference. It supports advanced fine-tuning strategies like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Unified Preference Optimization (UPO). ERNIE-4.5 uses a modular MoE architecture with 64 text experts, of which 6 are activated per token, and offers extremely long context length support (up to 131,072 tokens). It integrates with Hugging Face Transformers and PaddlePaddle, and is compatible with ERNIEKit and FastDeploy for streamlined training and deployment. This model is also being adapted for use with vLLM for faster inference.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
11

ERNIE-4.5-21B-A3B-Paddle

Baidu’s 21B MoE language model optimized for PaddlePaddle inference

ERNIE-4.5-21B-A3B-Paddle is a post-trained Mixture-of-Experts (MoE) language model from Baidu, designed for high-performance generation and understanding tasks. With 21 billion total parameters and 3 billion activated per token, it is optimized for large-scale inference using the PaddlePaddle framework. The model architecture supports efficient training and inference through advanced routing strategies, FP8 mixed-precision training, expert parallelism, and quantization. While primarily text-based, the architecture also includes vision experts for broader applicability, though this version focuses on text. ERNIE-4.5 incorporates fine-tuning methods like SFT, DPO, and UPO for performance and alignment with user preferences. It supports long context windows up to 131,072 tokens and integrates with ERNIEKit for streamlined fine-tuning. Deployment is supported via FastDeploy and is being adapted for vLLM and Hugging Face Transformers.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
12

ERNIE-4.5-300B-A47B-2Bits-Paddle

ERNIE 4.5 MoE model with ultra-efficient 2-bit quantization for infere

ERNIE-4.5-300B-A47B-2Bits-Paddle is a 2-bit quantized variant of Baidu’s 300B-parameter Mixture-of-Experts (MoE) language model, designed for ultra-low-resource inference. Despite the extreme compression, the model retains 47 billion active parameters per token and supports high-quality language generation across English and Chinese. Built with PaddlePaddle and optimized for deployment on a single 141GB GPU, it uses sophisticated quantization (WINT2) and expert-parallel collaboration to achieve lossless performance. The model supports a context length of up to 131,072 tokens and integrates with FastDeploy for fast service setup. Like other ERNIE 4.5 models, it benefits from pretraining and modality-specific post-training via SFT, DPO, and UPO methods. It is especially suited for applications requiring high throughput and minimal latency with limited hardware. Users are advised to use temperature 0.8 and top-p 0.8 for optimal sampling.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
13

ERNIE-4.5-300B-A47B-Base-PT

Post-trained ERNIE 4.5 model for efficient, high-quality text tasks

ERNIE-4.5-300B-A47B-Base-PT is a post-trained variant of Baidu’s large-scale text-only MoE model, featuring 300 billion total parameters with 47 billion active per token. It builds upon the pretrained ERNIE 4.5 foundation and is optimized for natural language understanding and generation. The model supports advanced fine-tuning via SFT, LoRA, and DPO through the ERNIEKit training toolkit. It is compatible with PaddlePaddle and Transformers, making deployment and customization highly flexible. The architecture maintains scalability and efficiency using heterogeneous expert routing, FP8 precision, and quantized inference up to 2-bit. With a context length of 131,072 tokens, it’s designed for long-form generation and reasoning tasks. This post-trained version is ideal for developers seeking reliable LLM performance with high adaptability to real-world workloads.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
14

ERNIE-4.5-300B-A47B-Base-Paddle

Large-scale MoE text model optimized for reasoning and generation

ERNIE-4.5-300B-A47B-Base-Paddle is a powerful large language model by Baidu, based on a 300B parameter Mixture-of-Experts (MoE) architecture. It activates 47B parameters per token and is optimized for high-quality text generation and reasoning. This model is part of the ERNIE 4.5 series and leverages a heterogeneous MoE structure to balance performance and efficiency. It was trained in stages, starting with language understanding before expanding to include vision capabilities—though this variant focuses solely on text. Built using PaddlePaddle, it supports advanced infrastructure features like FP8 mixed-precision training, hybrid parallelism, and 4-bit/2-bit quantization for scalable deployment. The model supports long-context tasks with a maximum sequence length of 131,072 tokens. ERNIEKit enables easy fine-tuning using LoRA, SFT, or DPO, while FastDeploy and Transformers provide flexible deployment options across environments.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
15

ERNIE-4.5-300B-A47B-FP8-Paddle

ERNIE 4.5 MoE model in FP8 for efficient high-performance inference

ERNIE-4.5-300B-A47B-FP8-Paddle is a quantized version of Baidu’s MoE large language model, post-trained for text generation tasks and optimized for FP8 precision. This variant retains the original’s 300 billion total parameters with 47 billion active per token, enabling powerful language understanding while dramatically improving inference efficiency. Built using PaddlePaddle, it supports multi-GPU distributed deployment and leverages advanced routing strategies and expert parallelism. It is especially well-suited for production environments requiring high throughput and lower memory use, while maintaining high reasoning and generation quality. The model can be used with FastDeploy and integrates cleanly with Python APIs for prompt-based generation workflows. It supports long context lengths (up to 131,072 tokens) and includes both Chinese and English prompt templates for web search applications.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
16

ERNIE-4.5-300B-A47B-PT

Post-trained ERNIE 4.5 MoE text model with 300B parameters

ERNIE-4.5-300B-A47B-PT is a post-trained, text-only Mixture-of-Experts (MoE) model with 300 billion total parameters and 47 billion active per token. Built on Baidu's ERNIE 4.5 architecture, it benefits from advanced innovations in pretraining and routing, including modality-isolated routing and token-balanced loss—even though this variant focuses purely on text. Designed for general-purpose natural language understanding and generation, it is fine-tuned using Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Unified Preference Optimization (UPO). Developers can deploy and fine-tune it using ERNIEKit or integrate it via Hugging Face Transformers with full support for custom prompts and chat templates. It supports highly efficient inference via FastDeploy, with multiple quantized variants (WINT4, WINT8, WINT2, FP8) for a range of hardware setups.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
17

ERNIE-4.5-300B-A47B-Paddle

Powerful text-only ERNIE 4.5 MoE model with 300B parameters

ERNIE-4.5-300B-A47B-Paddle is a large-scale text-only Mixture-of-Experts (MoE) model built on Baidu’s ERNIE 4.5 architecture. With 300 billion total parameters and 47 billion activated per token, it is designed to handle complex natural language understanding and generation tasks. The model incorporates multimodal MoE pretraining infrastructure—although only the text modality is active in this version—leveraging innovations like modality-isolated routing, router orthogonal loss, and token-balanced optimization. It supports highly efficient deployment via PaddlePaddle, with quantization-ready configurations including 4-bit, 8-bit, and 2-bit variants for high-performance inference on large GPU clusters. Post-training techniques such as Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Unified Preference Optimization (UPO) have been applied for better alignment and response quality.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
18

ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle

ERNIE 4.5 MoE model with 4/8-bit quantization for fast, efficient infe

ERNIE-4.5-300B-A47B-W4A8C8-TP4-Paddle is a 300B-parameter Mixture-of-Experts (MoE) language model by Baidu, optimized with 4-bit weights and 8-bit activations for highly efficient inference. This quantized variant significantly reduces memory requirements while preserving output quality, enabling deployment on systems with limited GPU capacity. The model activates 47 billion parameters per token and is trained for high-performance text generation, supporting both Chinese and English. It leverages PaddlePaddle with TP4 (tensor parallelism across 4 GPUs), fine-grained scheduling, and expert parallelism for scalable, modular performance. The model includes long context support up to 131,072 tokens and integrates easily with FastDeploy for real-time applications. Like other ERNIE 4.5 variants, it was trained using supervised fine-tuning (SFT), DPO, and UPO to align with complex reasoning and generative tasks.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
19

ERNIE-4.5-VL-28B-A3B-Base-PT

Pretrained multimodal MoE model for complex text and vision tasks

ERNIE-4.5-VL-28B-A3B-Base-PT is a large-scale multimodal Mixture-of-Experts (MoE) model developed by Baidu, featuring 28 billion total parameters and 3 billion activated per token. It is pretrained to handle both text and image inputs, enabling it to excel in image-to-text and conversational AI tasks. The model uses a staged training strategy—starting with text-only training and then integrating vision components using ViT, adapters, and visual experts for robust cross-modal understanding. A heterogeneous MoE design, combined with advanced routing techniques and token-balancing strategies, ensures high efficiency and minimal interference between modalities. It is built on PaddlePaddle and includes innovations like intra-node parallelism, FP8 mixed precision, and 2/4-bit quantization for efficient inference. This PT (pretrained) version is suited for further fine-tuning on downstream multimodal tasks. The model supports English and Chinese and is released under the Apache 2.0 license.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
20

ERNIE-4.5-VL-28B-A3B-Base-Paddle

Multimodal model with 28B parameters for text and vision tasks

ERNIE-4.5-VL-28B-A3B-Base-Paddle is a multimodal Mixture-of-Experts (MoE) model designed to understand and generate content from both text and images. With 28 billion total parameters and 3 billion activated per token, it strikes a balance between performance and efficiency. It leverages a heterogeneous MoE architecture with modality-isolated routing and token-balanced losses to avoid cross-modality interference. The model undergoes staged pretraining: first focusing on textual understanding, then incorporating visual capabilities using Vision Transformers, adapters, and dedicated visual experts. It supports context lengths up to 131,072 tokens, making it suitable for long-form reasoning and image-text interactions. Built on PaddlePaddle and pretrained on trillions of tokens, it is optimized for conversational, generative, and reasoning tasks. The model supports English and Chinese and is released under the Apache 2.0 license.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
21

ERNIE-4.5-VL-28B-A3B-PT

Multimodal ERNIE 4.5 MoE model for advanced vision-language tasks

ERNIE-4.5-VL-28B-A3B-PT is a multimodal Mixture-of-Experts (MoE) model from Baidu, designed for sophisticated vision-language reasoning and generation. With 28 billion parameters (3 billion activated per token), it enables high-quality image-text interactions, supporting tasks like visual Q&A, description, and multimodal chain-of-thought. The model uses a heterogeneous MoE architecture with isolated routing and token-balanced training for optimized cross-modal representation. It features post-training enhancements through Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Unified Preference Optimization (UPO), along with Reinforcement Learning with Verifiable Rewards (RLVR). Built on PaddlePaddle and compatible with the Transformers library, it supports both thinking and non-thinking inference modes. It handles long contexts (up to 131,072 tokens) and is designed to scale across various hardware.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
22

ERNIE-4.5-VL-28B-A3B-Paddle

Multimodal ERNIE 4.5 MoE model for image-text reasoning and chat

ERNIE-4.5-VL-28B-A3B-Paddle is a multimodal MoE chat model designed for complex image-text tasks, featuring 28 billion total parameters with 3 billion activated per token. Built on PaddlePaddle, it excels in tasks like visual question answering, description generation, and multimodal reasoning. It employs a heterogeneous Mixture-of-Experts architecture that supports both thinking and non-thinking inference modes. The model benefits from advanced pretraining and posttraining strategies, including Reinforcement Learning with Verifiable Rewards (RLVR), to enhance alignment and performance. Fine-tuned for real-world applications, it integrates language and vision through supervised learning, DPO, and UPO techniques. It supports long contexts up to 131,072 tokens and can be deployed using FastDeploy or the Hugging Face Transformers library. This version is ideal for developers needing high-performance, scalable multimodal capabilities in chat or image-based reasoning systems.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
23

ERNIE-4.5-VL-424B-A47B-Base-PT

Multimodal MoE model fine-tuned for text and visual comprehension

ERNIE-4.5-VL-424B-A47B-Base-PT is a powerful multimodal Mixture-of-Experts (MoE) model developed by Baidu and fine-tuned for enhanced performance across both text and visual tasks. It builds upon the pretraining of ERNIE 4.5, using modality-specific post-training techniques to optimize for general-purpose natural language processing and visual-language reasoning. The model employs a heterogeneous MoE architecture with modality-isolated routing and loss-balancing mechanisms to ensure efficient and specialized expert activation. With a total of 424 billion parameters—47 billion of which are active per token—it supports large context windows and deep cross-modal understanding. Key training strategies include FP8 mixed precision, fine-grained recomputation, and advanced quantization methods for efficient inference. It supports both “thinking” and “non-thinking” visual modes, allowing it to handle a range of tasks from pure text generation to image-aware reasoning.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
24

ERNIE-4.5-VL-424B-A47B-Base-Paddle

Latent diffusion model generating high-quality text-to-image outputs

ERNIE-4.5-VL-424B-A47B-Base-Paddle is a multimodal Mixture-of-Experts (MoE) model developed by Baidu, designed to understand and generate both text and image-based information. It utilizes a heterogeneous MoE architecture with modality-isolated routing and specialized loss functions to ensure effective learning across both modalities. Pretrained with trillions of tokens, the model activates 47B parameters per token out of a total of 424B, optimizing for scalability and precision. Its training incorporates a staged approach, first focusing on language, then extending to vision with additional modules like ViT and visual experts. The model supports extremely long contexts (up to 131,072 tokens), enabling complex reasoning and narrative generation. Built on the PaddlePaddle framework, it leverages FP8 mixed precision, hybrid parallelism, and quantization techniques for efficient performance.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
25

ERNIE-4.5-VL-424B-A47B-PT

Advanced multimodal ERNIE model for vision-language reasoning

ERNIE-4.5-VL-424B-A47B-PT is a large-scale multimodal MoE model developed by Baidu, integrating advanced capabilities in both language and vision. With 424 billion total parameters and 47 billion activated per token, it builds on ERNIE 4.5’s MoE foundation and introduces strong image-text interaction for complex reasoning and generation tasks. The model benefits from a structured post-training process including Supervised Fine-tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR), enhancing its alignment and performance across diverse use cases. Designed to support both thinking and non-thinking inference modes, it enables flexible and interpretable outputs in real-world applications. Its heterogeneous MoE structure includes modality-isolated routing and token-balanced loss to ensure efficient joint training of text and visual components.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project