Page 4 | Best Open Source Mac AI Models 2025

AI Models for Mac

View 108 business solutions

AI Models Mac Clear Filters

MongoDB 8.0 on Atlas | Run anywhere
Now available in even more cloud regions across AWS, Azure, and Google Cloud.

MongoDB 8.0 brings enhanced performance and flexibility to Atlas—with expanded availability across 125+ regions globally. Build modern apps anywhere your users are, with the power of a modern database behind you.

Learn More
Picsart Enterprise Background Removal API for Stunning eCommerce Visuals
Instantly remove the background from your images in just one click.

With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.

Learn More
1

FLUX.1-schnell

12B-parameter image generator using fast rectified flow transformers

FLUX.1-schnell is a 12 billion parameter text-to-image model developed by Black Forest Labs, designed for high-quality image generation using rectified flow transformers. It produces competitive visual results with strong prompt adherence, rivaling closed-source models in just 1 to 4 inference steps. Trained using latent adversarial diffusion distillation, the model is optimized for both quality and speed. It is released under the Apache 2.0 license, allowing commercial, scientific, and personal use. The model can be accessed via the FluxPipeline in Hugging Face’s diffusers library and is compatible with local workflows like ComfyUI. Available through several inference providers including Replicate and fal.ai, FLUX.1-schnell is well-documented and supported for developers. While capable, the model is subject to limitations such as occasional prompt misalignment and potential amplification of societal biases, and it may not be used in harmful, exploitative, or deceptive applications.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
2

GPT-2

GPT-2 is a 124M parameter English language model for text generation

GPT-2 is a pretrained transformer-based language model developed by OpenAI for generating natural language text. Trained on 40GB of internet data from outbound Reddit links (excluding Wikipedia), it uses causal language modeling to predict the next token in a sequence. The model was trained without human labels and learns representations of English that support text generation, feature extraction, and fine-tuning. GPT-2 uses a byte-level BPE tokenizer with a vocabulary of 50,257 and handles sequences up to 1024 tokens. It’s the smallest of the GPT-2 family with 124 million parameters and can be used with Hugging Face's Transformers in PyTorch, TensorFlow, and JAX. Though widely used, it reflects biases from its training data and is not suitable for factual tasks or sensitive deployments without further scrutiny. Despite limitations, GPT-2 remains a foundational model for generative NLP tasks and research.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
3

Hunyuan-A13B-Instruct

Efficient 13B MoE language model with long context and reasoning modes

Hunyuan-A13B-Instruct is a powerful instruction-tuned large language model developed by Tencent using a fine-grained Mixture-of-Experts (MoE) architecture. While the total model includes 80 billion parameters, only 13 billion are active per forward pass, making it highly efficient while maintaining strong performance across benchmarks. It supports up to 256K context tokens, advanced reasoning (CoT) abilities, and agent-based workflows with tool parsing. The model offers both fast and slow thinking modes, letting users trade off speed for deeper reasoning. It excels in mathematics, science, coding, and multi-turn conversation tasks, rivaling or outperforming larger models in several areas. Deployment is supported via TensorRT-LLM, vLLM, and SGLang, with Docker images and integration guides provided. Open-source under a custom license, it's ideal for researchers and developers seeking scalable, high-context AI capabilities with optimized inference.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
4

Janus-Pro-7B

Unified multimodal model for both vision and text generation tasks

Janus-Pro-7B is a 7-billion-parameter autoregressive model developed by DeepSeek AI that unifies multimodal understanding and generation within a single transformer architecture. It introduces a decoupled visual encoding approach, separating the vision input pathways for understanding and generation, which improves model flexibility and avoids performance conflicts. For understanding tasks, it leverages the SigLIP-L vision encoder with 384x384 image resolution, while for generation, it uses a specialized image tokenizer with a downsampling rate of 16. Built on the DeepSeek-LLM 7B base, Janus-Pro achieves performance on par with or better than task-specific models across a wide range of vision-language tasks. This design enables seamless any-to-any functionality—such as text-to-image, image captioning, and visual question answering—under a unified framework. Janus-Pro is released under the MIT license and supports PyTorch-based multimodal applications.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
5

Kimi K2

1T MoE model with 32B active params, optimized for agentic reasoning

Kimi K2 Instruct is a high-performance Mixture-of-Experts (MoE) language model developed by Moonshot AI, activating 32B parameters per forward pass from a total 1 trillion. Designed for agentic reasoning, tool use, and advanced coding tasks, it achieves SOTA-level performance on multiple benchmarks such as SWE-Bench, AIME, and MMLU. Trained on 15.5T tokens using the Muon optimizer, it incorporates novel techniques for scaling stability. Kimi K2 supports a 128K context window, enabling detailed multi-turn conversations and long input handling. It includes native support for tool-calling, making it suitable for autonomous agents and real-world task execution. The Instruct variant is fine-tuned for chat-style interaction and general-purpose deployment, while the Base variant targets research and customization. Kimi K2 is released under a modified MIT license and deployable through engines like vLLM, SGLang, KTransformers, and TensorRT-LLM.

Downloads: 0 This Week

Last Update: 2025-07-14
See Project
6

Kokoro-82M

Lightweight, fast, and high-quality open TTS model with 82M params

Kokoro-82M is an open-weight, lightweight text-to-speech (TTS) model featuring 82 million parameters, developed to deliver high-quality voice synthesis with exceptional efficiency. Despite its compact size, Kokoro rivals the output quality of much larger models while remaining significantly faster and cheaper to run. Built on StyleTTS2 and ISTFTNet architectures, it uses a decoder-only setup without diffusion, enabling rapid audio generation with low computational overhead. Kokoro supports multiple voices and languages and is compatible with environments like Google Colab or production APIs. It was trained on a few hundred hours of permissively licensed and synthetic audio paired with IPA phoneme labels, ensuring broad legal usability. Licensed under Apache 2.0, Kokoro is deployable in commercial, research, and personal projects, including those with monetized outputs under $1M in annual revenue.

Downloads: 0 This Week

Last Update: 2025-06-26
See Project
7

Llama-2-70b-chat-hf

Llama-2-70B-Chat is Meta’s largest fine-tuned open-source chat LLM

Llama-2-70B-Chat is Meta’s largest fine-tuned large language model, optimized for dialogue and aligned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). It features 70 billion parameters and uses a transformer architecture with grouped-query attention (GQA) to improve inference scalability. Trained on 2 trillion tokens from publicly available sources and over a million human-annotated examples, the model outperforms most open-source chat models and rivals closed-source systems like ChatGPT in benchmarks. It supports English-only input/output and is designed primarily for assistant-style conversations. The model shows strong results on reasoning, safety, and truthfulness evaluations, scoring 64.14 on TruthfulQA and near-zero toxicity. While it achieves top performance on many benchmarks, Meta emphasizes the need for responsible use and scenario-specific safety evaluation.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
8

Llama-2-7b

7B-parameter foundational LLM by Meta for text generation tasks

Llama-2-7B is a foundational large language model developed by Meta as part of the Llama 2 family, designed for general-purpose text generation in English. It has 7 billion parameters and uses an optimized transformer-based, autoregressive architecture. Trained on 2 trillion tokens of publicly available data, it serves as the base for fine-tuned models like Llama-2-Chat. The model is pretrained only, meaning it is not optimized for dialogue but can be adapted for various natural language processing tasks, such as summarization, completion, and code generation. It demonstrates solid performance on academic benchmarks like MMLU and TruthfulQA, improving on its predecessor, Llama 1. The Llama-2-7B model is released under Meta’s Llama 2 Community License, which allows for commercial use under certain conditions, especially for companies with fewer than 700 million MAUs.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
9

Llama-2-7b-chat-hf

Dialogue-optimized 7B language model for safe and helpful chatting

Llama-2-7b-chat-hf is a fine-tuned large language model developed by Meta, designed specifically for dialogue use cases. With 7 billion parameters and built on an optimized transformer architecture, it uses supervised fine-tuning and reinforcement learning with human feedback (RLHF) to enhance helpfulness, coherence, and safety. It outperforms most open-source chat models and rivals proprietary systems like ChatGPT in human evaluations. Trained on 2 trillion tokens of public text and over 1 million human-annotated examples, Llama-2-7b-chat demonstrates strong performance in commonsense reasoning, reading comprehension, and safety benchmarks. It accepts text-only inputs and generates coherent text completions or conversations in English. The model is accessible in Hugging Face Transformers format under the Llama 2 Community License, with specific formatting required for optimal performance in chat applications.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
Powering the best of the internet | Fastly
Fastly's edge cloud platform delivers faster, safer, and more scalable sites and apps to customers.

Ensure your websites, applications and services can effortlessly handle the demands of your users with Fastly. Fastly’s portfolio is designed to be highly performant, personalized and secure while seamlessly scaling to support your growth.

Try for free
10

Llama-2-7b-hf

Llama-2-7B is a 7B-parameter transformer model for text generation

Llama-2-7B is a foundational large language model developed by Meta as part of the Llama 2 family, designed for general-purpose text generation tasks. It is a 7 billion parameter auto-regressive transformer trained on 2 trillion tokens from publicly available sources, using an optimized architecture without Grouped-Query Attention (GQA). This model is the pretrained version, intended for research and commercial use in English, and can be adapted for downstream applications such as summarization, question answering, and code generation. Llama-2-7B outputs text only, with input also restricted to text. Fine-tuned variants of this model, such as Llama-2-Chat, demonstrate competitive performance in dialogue tasks and safety benchmarks, rivaling proprietary models. The model was trained using Meta’s infrastructure, including 184,320 GPU hours and emissions fully offset by Meta’s sustainability program.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
11

Llama-3.1-8B-Instruct

Multilingual 8B-parameter chat-optimized LLM fine-tuned by Meta

Llama-3.1-8B-Instruct is a multilingual, instruction-tuned language model developed by Meta, designed for high-quality dialogue generation across eight languages, including English, Spanish, French, German, Italian, Portuguese, Hindi, and Thai. It uses a transformer-based, autoregressive architecture with Grouped-Query Attention and supports a 128k token context window. The model was fine-tuned using a combination of supervised fine-tuning (SFT), reinforcement learning with human feedback (RLHF), and high-quality human and synthetic safety data. It excels at conversational AI, tool use, coding, and multilingual reasoning, achieving strong performance across a wide range of academic and applied benchmarks. The model is released under the Llama 3.1 Community License, which permits commercial use for organizations with fewer than 700 million monthly active users, provided they comply with Meta’s Acceptable Use Policy.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
12

Llama-3.2-1B

Llama 3.2–1B: Multilingual, instruction-tuned model for mobile AI

meta-llama/Llama-3.2-1B is a lightweight, instruction-tuned generative language model developed by Meta, optimized for multilingual dialogue, summarization, and retrieval tasks. With 1.23 billion parameters, it offers strong performance in constrained environments like mobile devices, without sacrificing versatility or multilingual support. It is part of the Llama 3.2 family, trained on up to 9 trillion tokens and aligned using supervised fine-tuning, preference optimization, and safety tuning. The model supports eight officially listed languages (including Spanish, German, Hindi, and Thai) but can be adapted to more. Llama 3.2-1B outperforms other open models in several benchmarks relative to its size and offers quantized versions for efficiency. It uses a refined transformer architecture with Grouped-Query Attention (GQA) and supports long context windows of up to 128k tokens.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
13

Llama-3.2-1B-Instruct

Instruction-tuned 1.2B LLM for multilingual text generation by Meta

Llama-3.2-1B-Instruct is Meta’s multilingual, instruction-tuned large language model with 1.24 billion parameters, optimized for dialogue, summarization, and retrieval tasks. It builds upon the Llama 3.1 architecture and incorporates fine-tuning techniques like SFT, DPO, and quantization-aware training for improved alignment, efficiency, and safety. The model supports eight primary languages (including English, Spanish, Hindi, and Thai) and was trained on a curated mix of publicly available online data, with a December 2023 knowledge cutoff. Llama-3.2-1B is lightweight enough for deployment on constrained devices like smartphones, using formats like SpinQuant and QLoRA to reduce model size and latency. Despite its small size, it performs competitively across benchmarks such as MMLU, ARC, and TLDR summarization. The model is distributed under the Llama 3.2 Community License, requiring attribution and adherence to Meta’s Acceptable Use Policy.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
14

Llama-3.3-70B-Instruct

Llama-3.3-70B-Instruct is a multilingual AI optimized for helpful chat

Llama-3.3-70B-Instruct is Meta's large, instruction-tuned language model designed for safe, multilingual, assistant-style conversations and text generation. With 70 billion parameters, it supports English, Spanish, French, German, Italian, Portuguese, Hindi, and Thai, offering state-of-the-art performance across a wide range of benchmarks including MMLU, HumanEval, and GPQA. The model is built on a transformer architecture with grouped-query attention, trained on over 15 trillion tokens and refined using both supervised fine-tuning and reinforcement learning with human feedback. It supports long context windows up to 128k tokens and enables advanced tool use for function calling and integration. Llama-3.3 is distributed under the Llama Community License, allowing commercial use within specific limits, and requires proper attribution and adherence to Meta's Acceptable Use Policy.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
15

Meta-Llama-3-8B

Versatile 8B language model optimized for helpful, safe dialogue

Meta-Llama-3-8B is a powerful, open-weight large language model developed by Meta as part of the Llama 3 family, optimized for natural language understanding and generation in English. With 8 billion parameters, it uses an enhanced transformer architecture featuring Grouped-Query Attention (GQA) and supports an 8k context length. Trained on over 15 trillion tokens from publicly available sources and fine-tuned with more than 10 million human annotations, it excels in tasks requiring instruction-following, reasoning, and code generation. It significantly outperforms previous Llama 2 models and many open-source alternatives on benchmarks like MMLU, GSM8K, and HumanEval. Designed with safety in mind, Llama 3 includes alignment via supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), and is backed by tools like Llama Guard 2 and Code Shield. The model is intended for commercial and research use, especially for building assistant-like chatbots.

Downloads: 0 This Week

Last Update: 2025-06-26
See Project
16

Meta-Llama-3-8B-Instruct

Instruction-tuned 8B LLM by Meta for helpful, safe English dialogue

Meta-Llama-3-8B-Instruct is an instruction-tuned large language model from Meta’s Llama 3 family, optimized for safe and helpful English dialogue. It uses an autoregressive transformer architecture with Grouped-Query Attention (GQA) and supports an 8k token context length. Fine-tuned using supervised learning and reinforcement learning with human feedback (RLHF), the model achieves strong results on benchmarks like MMLU, GSM8K, and HumanEval. Trained on over 15 trillion tokens of publicly available data and more than 10 million human-annotated examples, it excludes any Meta user data. The model is released under the Meta Llama 3 Community License, which allows commercial use for organizations with fewer than 700 million MAUs, and imposes clear use, attribution, and redistribution rules. Meta provides safety tools like Llama Guard 2 and Code Shield to help developers implement system-level safety in applications.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
17

MiniMax-M1

Open-weight, large-scale hybrid-attention reasoning model

MiniMax-M1 is the world’s first open-weight, large-scale hybrid-attention reasoning model designed for long-context and complex reasoning tasks. Powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism, it efficiently supports context lengths up to 1 million tokens—eight times larger than many contemporary models. MiniMax-M1 significantly reduces computational overhead at generation time, consuming only about 25% FLOPs compared to comparable models for very long sequences. Trained using large-scale reinforcement learning on diverse tasks, it excels in mathematics, software engineering, agentic tool use, and long-context understanding benchmarks. It outperforms other open-weight models like DeepSeek R1 and Qwen3-235B on complex reasoning and coding challenges. MiniMax-M1 is available in two versions with 40K and 80K token thinking budgets, offering scalable performance based on your application needs.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
18

Mistral-7B-Instruct-v0.2

Instruction-tuned 7B model for chat and task-oriented text generation

Mistral-7B-Instruct-v0.2 is a fine-tuned version of the Mistral-7B-v0.2 language model, designed specifically for following instructions in a conversational format. It supports a 32k token context window, enabling more detailed and longer interactions compared to its predecessor. The model is trained to respond to user prompts formatted with [INST] and [/INST] tags, and it performs well in instruction-following tasks like Q&A, summarization, and explanations. It can be used via the official mistral_common tokenizer or Hugging Face’s transformers library, and supports generation on GPUs with BF16 precision. Built on a transformer architecture without sliding-window attention, the model is optimized for fast inference and chat integration. Though it lacks moderation mechanisms, it showcases the capability of Mistral-7B as a base for further fine-tuning and safety tooling. Mistral-7B-Instruct-v0.2 is licensed under Apache 2.0 and widely used across open-source projects.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
19

Mistral-7B-v0.1

Efficient 7B parameter LLM outperforming Llama 2 13B in benchmarks

Mistral-7B-v0.1 is a pretrained 7-billion parameter transformer language model developed by Mistral AI, designed to deliver high performance with optimized compute efficiency. It outperforms Llama 2 13B on all evaluated benchmarks despite its smaller size. The architecture integrates Grouped-Query Attention (GQA) and Sliding-Window Attention, enabling efficient inference and improved long-context performance. Mistral-7B uses a byte-fallback BPE tokenizer for better multilingual and code handling. Released under the Apache 2.0 license, it is openly available for research and commercial use. As a base model, it does not include alignment, safety, or moderation mechanisms, making it suitable for developers building customized applications. It is widely adopted in the open-source community, serving as a strong foundation for instruction-tuned and specialized fine-tuned models.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
20

Mixtral-8x7B-Instruct-v0.1

Sparse Mixture of Experts chat model with strong multitask performance

Mixtral-8x7B-Instruct-v0.1 is an instruction-tuned large language model developed by Mistral AI, based on a Sparse Mixture of Experts (MoE) architecture where only 2 of 8 expert models are active per forward pass. With a total of 46.7 billion parameters, it delivers the capabilities of a much larger model while remaining compute-efficient. Fine-tuned for multi-turn conversations, it follows a strict instruction formatting pattern using [INST] and [/INST] tags, and demonstrates superior performance over Llama 2 70B on several benchmarks. The model is accessible via Hugging Face Transformers and supports inference with tools like Flash Attention 2 and bitsandbytes for low-precision runs. It outputs coherent, contextually appropriate responses in up to 5 languages and is suitable for chat-based tasks in both research and production environments. However, it lacks built-in moderation or alignment safeguards, requiring external guardrails for safe deployment.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
21

Nanonets-OCR-s

State-of-the-art image-to-markdown OCR model

Nanonets-OCR-s is an advanced image-to-markdown OCR model that transforms documents into structured and semantically rich markdown. It goes beyond basic text extraction by intelligently recognizing content types and applying meaningful tags, making the output ideal for Large Language Models (LLMs) and automated workflows. The model expertly converts mathematical equations into LaTeX syntax, distinguishing between inline and display modes for accuracy. It also generates descriptive <img> tags for images like logos, charts, and graphs, enabling better interpretation by downstream systems. Signatures and watermarks are detected and isolated within dedicated tags to maintain document integrity, which is vital for legal and business uses. Form elements like checkboxes and radio buttons are converted into standardized Unicode symbols for consistent handling. Additionally, complex tables are extracted and formatted in both markdown and HTML to support versatile document processing.

Downloads: 0 This Week

Last Update: 2025-06-26
See Project
22

OmniGen2

Multimodal generation AI model for image and text generation

OmniGen2 is a powerful, efficient open-source multimodal generation model designed for diverse AI tasks involving both images and text. It improves on its predecessor by introducing separate decoding pathways for text and image, along with unshared parameters and a decoupled image tokenizer, enhancing flexibility and performance. Built on a strong Qwen-VL-2.5 foundation, OmniGen2 excels in visual understanding, high-quality text-to-image generation, and instruction-guided image editing. It also supports in-context generation, enabling the combination of multiple inputs like humans, objects, and scenes to produce novel, coherent visuals. The project offers ready-to-use models, extensive demos via Gradio, and supports resource-efficient features like CPU offloading to accommodate limited VRAM devices. Users can fine-tune generation results with hyperparameters like text and image guidance scales, maximum image resolution, and negative prompts.

Downloads: 0 This Week

Last Update: 2025-06-30
See Project
23

OrangeMixs

Diffusion model extension enabling image generation with fine control

OrangeMixs is a collection of popular merged Stable Diffusion models widely used in the Japanese AI art community, curated and maintained by WarriorMama777. The repository provides various high-quality anime-style and photorealistic merge models, designed to work seamlessly with StableDiffusionWebui:Automatic1111 and similar tools. OrangeMixs models are known for blending anime aesthetics with improved anatomical accuracy, vivid colors, and diverse artistic styles, including flat anime shading and oil painting textures. The project regularly updates models, offering detailed merge recipes and instructions using tools like the SuperMerger extension. It includes variants suitable for safe-for-work (SFW), soft NSFW, and hardcore NSFW content, giving users control over output style and content. The models are open access under the CreativeML OpenRAIL-M license, allowing commercial use with clear usage guidelines.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
24

QwQ-32B

QwQ-32B is a reasoning-focused language model for complex tasks

QwQ-32B is a 32.8 billion parameter reasoning-optimized language model developed by Qwen as part of the Qwen2.5 family, designed to outperform conventional instruction-tuned models on complex tasks. Built with RoPE positional encoding, SwiGLU activations, RMSNorm, and Attention QKV bias, it excels in multi-turn conversation and long-form reasoning. It supports an extended context length of up to 131,072 tokens and incorporates supervised fine-tuning and reinforcement learning for enhanced instruction-following capabilities. The model is capable of structured thinking and delivers competitive performance against top models like DeepSeek-R1 and o1-mini. Recommended usage involves prompts starting with <think>\n, non-greedy sampling strategies, and support for standardized outputs on math and multiple-choice tasks. For long input handling, it supports YaRN (Yet another RoPE Namer) for context scaling.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
25

Qwen2-7B-Instruct

Instruction-tuned 7B language model for chat and complex tasks

Qwen2-7B-Instruct is a 7.62-billion-parameter instruction-tuned language model from the Qwen2 series developed by Alibaba's Qwen team. Built on a transformer architecture with SwiGLU activation and group query attention, it is optimized for chat, reasoning, coding, multilingual tasks, and extended context understanding up to 131,072 tokens. The model was pretrained on a large-scale dataset and aligned via supervised fine-tuning and direct preference optimization. It shows strong performance across benchmarks such as MMLU, MT-Bench, GSM8K, and Humaneval, often surpassing similarly sized open-source models. Designed for conversational use, it integrates with Hugging Face Transformers and supports long-context applications via YARN and vLLM for efficient deployment.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project