Page 5 | Best Open Source Mac AI Models 2025

AI Models for Mac

View 108 business solutions

AI Models Mac Clear Filters

Powering the best of the internet | Fastly
Fastly's edge cloud platform delivers faster, safer, and more scalable sites and apps to customers.

Ensure your websites, applications and services can effortlessly handle the demands of your users with Fastly. Fastly’s portfolio is designed to be highly performant, personalized and secure while seamlessly scaling to support your growth.

Try for free
Gen AI apps are built with MongoDB Atlas
The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free
1

Qwen2.5-1.5B-Instruct

Instruction-tuned 1.5B Qwen2.5 model for chat and RL fine-tuning

Qwen2.5-1.5B-Instruct is an instruction-tuned variant of the Qwen2.5 language model with 1.54 billion parameters, designed for text generation and conversational tasks. It was developed for use within the Gensyn RL Swarm system, which enables decentralized reinforcement learning fine-tuning over peer-to-peer networks. The model architecture includes rotary positional embeddings (RoPE), SwiGLU activation, RMSNorm, attention QKV bias, and tied word embeddings. It features 28 layers, a GQA attention mechanism with 12 query heads and 2 key-value heads, and a context window of up to 32,768 tokens for input and 8,192 tokens for output. While optimized for RL Swarm use, it can be integrated into standard workflows for inference and chat once fine-tuned. It supports BF16 tensors and is distributed as a Safetensors model. The base model is Qwen2.5-1.5B, with this version enhanced for instruction following and dialogue.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
2

Qwen2.5-14B-Instruct

Powerful 14B LLM with strong instruction and long-text handling

Qwen2.5-14B-Instruct is a powerful instruction-tuned language model developed by the Qwen team, based on the Qwen2.5 architecture. It features 14.7 billion parameters and is optimized for tasks like dialogue, long-form generation, and structured output. The model supports context lengths up to 128K tokens and can generate up to 8K tokens, making it suitable for long-context applications. It demonstrates improved performance in coding, mathematics, and multilingual understanding across over 29 languages. Qwen2.5-14B-Instruct is built on a transformer backbone with RoPE, SwiGLU, RMSNorm, and attention QKV bias. It’s resilient to varied prompt styles and is especially effective for JSON and tabular data generation. The model is instruction-tuned and supports chat templating, making it ideal for chatbot and assistant use cases.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
3

Qwen2.5-VL-3B-Instruct

Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video

Qwen2.5-VL-3B-Instruct is a 3.75 billion parameter multimodal model by Qwen, designed to handle complex vision-language tasks in both image and video formats. As part of the Qwen2.5 series, it supports image-text-to-text generation with capabilities like chart reading, object localization, and structured data extraction. The model can serve as an intelligent visual agent capable of interacting with digital interfaces and understanding long-form videos by dynamically sampling resolution and frame rate. It uses a SwiGLU and RMSNorm-enhanced ViT architecture and introduces mRoPE updates for robust temporal and spatial understanding. The model supports flexible image input (file path, URL, base64) and outputs structured responses like bounding boxes or JSON, making it highly versatile in commercial and research settings. It excels in a wide range of benchmarks such as DocVQA, InfoVQA, and AndroidWorld control tasks.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
4

Qwen2.5-VL-7B-Instruct

Multimodal 7B model for image, video, and text understanding tasks

Qwen2.5-VL-7B-Instruct is a multimodal vision-language model developed by the Qwen team, designed to handle text, images, and long videos with high precision. Fine-tuned from Qwen2.5-VL, this 7-billion-parameter model can interpret visual content such as charts, documents, and user interfaces, as well as recognize common objects. It supports complex tasks like visual question answering, localization with bounding boxes, and structured output generation from documents. The model is also capable of video understanding with dynamic frame sampling and temporal reasoning, enabling it to analyze and respond to long-form videos. Built with an enhanced ViT architecture using window attention, SwiGLU, and RMSNorm, it aligns closely with Qwen2.5 LLM standards. The model demonstrates high performance across benchmarks like DocVQA, ChartQA, and MMStar, and even functions as a tool-using visual agent.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
Build Securely on Azure with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
5

SmolLM3

Multilingual 3B LLM optimized for reasoning, math, and long contexts

SmolLM3 is a 3.08B parameter decoder-only language model designed by Hugging Face to deliver high performance in reasoning, math, and multilingual understanding. Trained on 11.2T tokens with a curriculum of web, code, and mathematical data, it uses advanced features like GQA and NoPE. The model supports extended context lengths up to 128k tokens via YaRN extrapolation, making it highly suitable for long-context applications. It outperforms or rivals larger models like Qwen3 and LLaMA3 on several reasoning, commonsense, and multilingual benchmarks. SmolLM3 natively supports six languages—English, French, Spanish, German, Italian, and Portuguese—while also having exposure to Arabic, Chinese, and Russian. It is open-source under Apache 2.0, with transparent training data, configs, and available quantized versions. The model is optimized through Anchored Preference Optimization (APO), achieving strong alignment and instruction-following behavior across a broad range of tasks.

Downloads: 0 This Week

Last Update: 2025-07-09
See Project
6

adetailer

YOLOv8/YOLOv9-based detector for face, hand, person, and fashion data

adetailer is a collection of YOLOv8 and YOLOv9 object detection models optimized for detecting detailed features such as faces, hands, clothing, and full-body silhouettes. Developed by Bingsu using the Ultralytics YOLO framework, the models are trained on a variety of datasets including WIDER Face, DeepFashion2, and anime segmentation sets. It supports a wide range of detection targets from realistic human faces and hands to anime characters and fashion garments. The models come in various sizes (nano, small, medium) offering different trade-offs between speed and accuracy, with the best face model (face_yolov9c.pt) achieving mAP50 of 0.748. Pretrained weights are available and can be loaded directly using the Hugging Face Hub and Ultralytics’ YOLO() interface. Despite security warnings related to pickle safety (due to getattr usage), the models are safe to use if sourced from trusted repositories.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
7

all-MiniLM-L12-v2

Fast, lightweight model for sentence embeddings and similarity tasks

all-MiniLM-L12-v2 is a sentence-transformer model that maps English sentences or paragraphs to 384-dimensional dense vectors for tasks such as semantic search, clustering, and similarity comparison. Built on Microsoft’s MiniLM-L12-H384-uncased, it was fine-tuned using over 1 billion sentence pairs through a contrastive learning objective. The model is lightweight (33.4M parameters), making it ideal for production use cases requiring speed and low latency. It supports usage through both the sentence-transformers library and Hugging Face Transformers, with pooling and normalization steps handled manually in the latter. Input sequences are truncated at 256 word pieces by default. It excels in tasks like sentence embedding generation, FAQ matching, document clustering, and duplicate question detection. Though trained in English, it generalizes well across many short-text NLP tasks.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
8

all-MiniLM-L6-v2

Compact, efficient model for sentence embeddings and semantic search

all-MiniLM-L6-v2 is a lightweight sentence-transformers model that maps sentences and short paragraphs into 384-dimensional dense vectors optimized for semantic tasks. Designed for high-speed, low-resource environments, it supports clustering, semantic search, and similarity comparison. Based on the MiniLM architecture, the model uses a contrastive learning objective trained on over 1 billion sentence pairs from diverse datasets like Reddit, WikiAnswers, and Stack Exchange. It outperforms many larger models in embedding quality relative to size and is available in PyTorch, TensorFlow, ONNX, and other formats. all-MiniLM-L6-v2 can be used with the sentence-transformers library or directly via Hugging Face Transformers with custom pooling. Text longer than 256 tokens is truncated, making it ideal for short-form text processing. Released under the Apache 2.0 license, the model is widely adopted across academic and commercial applications for its balance of performance and efficiency.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
9

all-mpnet-base-v2

Semantic sentence embeddings for clustering and search tasks

all-mpnet-base-v2 is a sentence embedding model from the Sentence-Transformers library that maps English sentences and paragraphs into dense 768-dimensional vector representations. Based on the microsoft/mpnet-base transformer, the model is fine-tuned using over 1.17 billion sentence pairs via contrastive learning to perform tasks such as semantic search, information retrieval, clustering, and similarity detection. It supports both PyTorch and ONNX, and can be used via SentenceTransformers or Hugging Face Transformers with custom pooling. This model truncates input longer than 384 tokens and achieves strong results across a variety of datasets, including Reddit, WikiAnswers, StackExchange, MS MARCO, and more. Originally trained during Hugging Face’s Community Week using JAX/Flax and TPUs, it delivers high-quality semantic embeddings suitable for production-scale NLP applications.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
10

answerai-colbert-small-v1

Compact multi-vector retriever with state-of-the-art ranking accuracy

answerai-colbert-small-v1 is a 33M parameter multi-vector retrieval model developed by Answer.AI, using the JaColBERTv2.5 training recipe. Despite its small size (MiniLM scale), it surpasses many larger models—including e5-large-v2 and bge-base-en-v1.5—on standard information retrieval benchmarks. It is optimized for retrieval-augmented generation (RAG), reranking, and vector search, compatible with ColBERT, RAGatouille, and Rerankers libraries. The model achieves top performance in tasks like HotpotQA, TRECCOVID, and NQ, demonstrating strong zero-shot generalization. It is especially suited for efficient retrieval in low-resource environments or latency-sensitive applications. Pretrained and open-sourced under the Apache 2.0 license, the model supports ONNX and Safetensors for deployment flexibility. Its performance positions it as a practical solution for high-accuracy RAG pipelines without the compute overhead of large language models.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
11

bart-large-cnn

Summarization model fine-tuned on CNN/DailyMail articles

facebook/bart-large-cnn is a large-scale sequence-to-sequence transformer model developed by Meta AI and fine-tuned specifically for abstractive text summarization. It uses the BART architecture, which combines a bidirectional encoder (like BERT) with an autoregressive decoder (like GPT). Pre-trained on corrupted text reconstruction, the model was further trained on the CNN/DailyMail dataset—a collection of news articles paired with human-written summaries. It performs particularly well in generating concise, coherent, and human-readable summaries from longer texts. Its architecture allows it to model both language understanding and generation tasks effectively. The model supports usage in PyTorch, TensorFlow, and JAX, and is integrated with the Hugging Face pipeline API for simple deployment. Due to its size and performance, it's widely used in real-world summarization applications such as news aggregation, legal document condensing, and content creation.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
12

bart-large-mnli

Zero-shot classification with BART fine-tuned on MultiNLI data

BART-Large-MNLI is a fine-tuned version of Facebook's BART-Large model, trained on the Multi-Genre Natural Language Inference (MultiNLI) dataset for natural language understanding tasks. Leveraging a textual entailment formulation, it enables powerful zero-shot classification by comparing a given input (premise) to multiple candidate labels phrased as hypotheses. The model determines how likely the premise entails each hypothesis, effectively ranking or scoring labels based on semantic similarity. This method allows users to classify any sequence into user-defined categories without task-specific fine-tuning. The model supports both single-label and multi-label classification using Hugging Face’s pipeline("zero-shot-classification"). It is implemented in PyTorch, with additional support for JAX and Rust, and is available under the MIT license. With 407 million parameters, it offers strong performance across a range of general-purpose text classification tasks.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
13

bert-base-cased

English BERT model using cased text for sentence-level tasks

bert-base-cased is a foundational transformer model pretrained on English using masked language modeling (MLM) and next sentence prediction (NSP). It is case-sensitive, treating "English" and "english" as distinct, making it suitable for tasks where casing matters. The model uses a bidirectional attention mechanism to deeply understand sentence structure, trained on BookCorpus and English Wikipedia. With 109M parameters and WordPiece tokenization (30K vocab size), it captures rich contextual embeddings. It is mostly intended for fine-tuning on downstream NLP tasks such as classification, token labeling, or question answering. The model can also be used out-of-the-box for masked token prediction using Hugging Face’s fill-mask pipeline. Though trained on neutral data, it still inherits and reflects societal biases present in the corpus.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
14

bert-base-chinese

BERT-based Chinese language model for fill-mask and NLP tasks

bert-base-chinese is a pre-trained language model developed by Google and hosted by Hugging Face, based on the original BERT architecture but tailored for Chinese. It supports fill-mask tasks and is pretrained on Chinese text using word piece tokenization and random masking strategies, following the standard BERT training procedure. With 12 hidden layers and a vocabulary size of 21,128 tokens, it has approximately 103 million parameters. The model is effective for a range of downstream NLP applications, including text classification, named entity recognition, and sentiment analysis in Chinese. It uses the same structure as the BERT base uncased English model, but it is trained entirely on Chinese data. While robust, like other large language models, it may reflect or amplify existing biases present in its training data. Due to limited transparency around the dataset and evaluation metrics, users should test it thoroughly before deployment in sensitive contexts.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
15

bert-base-multilingual-cased

Multilingual BERT model trained on 104 Wikipedia languages

bert-base-multilingual-cased is a multilingual version of BERT pre-trained on Wikipedia articles from the top 104 languages using masked language modeling (MLM) and next sentence prediction (NSP) objectives. Unlike uncased models, it preserves case distinctions (e.g., "english" ≠ "English"). Trained in a self-supervised fashion, this model captures deep bidirectional language representations, enabling it to be fine-tuned for a wide range of natural language understanding tasks across multiple languages. It supports sequence classification, token classification, question answering, and more. Built with a shared vocabulary of 110,000 tokens, it is compatible with both PyTorch and TensorFlow.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
16

bert-base-portuguese-cased

BERTimbau: BERT model pretrained for Brazilian Portuguese NLP

bert-base-portuguese-cased, also known as BERTimbau Base, is a BERT-based language model pretrained specifically for Brazilian Portuguese. Developed by NeuralMind, it has 12 layers and 110 million parameters. The model was trained using the brWaC corpus and achieves state-of-the-art performance in downstream tasks such as Named Entity Recognition, Sentence Textual Similarity, and Recognizing Textual Entailment. It supports case sensitivity (e.g., "Brasil" ≠ "brasil") and can be used for masked language modeling, sentence embeddings, or fine-tuning on a variety of Portuguese-language NLP tasks. It is compatible with both PyTorch and TensorFlow and available under the MIT license.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
17

bert-base-uncased

BERT-base-uncased is a foundational English model for NLP tasks

BERT-base-uncased is a 110-million-parameter English language model developed by Google, pretrained using masked language modeling and next sentence prediction on BookCorpus and English Wikipedia. It is case-insensitive and tokenizes text using WordPiece, enabling it to learn contextual relationships between words in a sentence bidirectionally. The model excels at feature extraction for downstream NLP tasks like sentence classification, named entity recognition, and question answering when fine-tuned appropriately. Its pretraining involved randomly masking 15% of tokens and predicting them based on surrounding context, allowing it to learn deep semantic and syntactic patterns. It has been widely used as a baseline and component in various fine-tuned models, achieving strong results on benchmarks like GLUE. Despite its success, BERT-base-uncased can exhibit social biases learned from its training data and is not designed for factual generation or open-ended text production.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
18

bge-base-en-v1.5

Efficient English embedding model for semantic search and retrieval

bge-base-en-v1.5 is an English sentence embedding model from BAAI optimized for dense retrieval tasks, part of the BGE (BAAI General Embedding) family. It is a fine-tuned BERT-based model designed to produce high-quality, semantically meaningful embeddings for tasks like semantic similarity, information retrieval, classification, and clustering. This version (v1.5) improves retrieval performance and stabilizes similarity score distribution without requiring instruction-based prompts. With 768 embedding dimensions and a maximum sequence length of 512 tokens, it achieves strong performance across multiple MTEB benchmarks, nearly matching larger models while maintaining efficiency. It supports use via SentenceTransformers, Hugging Face Transformers, FlagEmbedding, and ONNX for various deployment scenarios. Typical usage includes normalizing output embeddings and calculating cosine similarity via dot product for ranking.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
19

bge-large-en-v1.5

BGE-Large v1.5: High-accuracy English embedding model for retrieval

BAAI/bge-large-en-v1.5 is a powerful English sentence embedding model designed by the Beijing Academy of Artificial Intelligence to enhance retrieval-augmented language model systems. It uses a BERT-based architecture fine-tuned to produce high-quality dense vector representations optimized for sentence similarity, search, and retrieval. This model is part of the BGE (BAAI General Embedding) family and delivers improved similarity distribution and state-of-the-art results on the MTEB benchmark. It is recommended for use in document retrieval tasks, semantic search, and passage reranking, particularly when paired with a reranker like BGE-Reranker. The model supports inference through multiple frameworks, including FlagEmbedding, Sentence-Transformers, LangChain, and Hugging Face Transformers. It accepts English text as input and returns normalized 1024-dimensional embeddings suitable for cosine similarity comparisons.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
20

bge-m3

BGE-M3 is a multilingual embedding model

BGE-M3 is an advanced text embedding model developed by BAAI that excels in multi-functionality, multi-linguality, and multi-granularity. It supports dense retrieval, sparse retrieval (lexical weighting), and multi-vector retrieval (ColBERT-style), making it ideal for hybrid systems in retrieval-augmented generation (RAG). The model handles over 100 languages and supports long-text inputs up to 8192 tokens, offering flexibility across short queries and full documents. BGE-M3 was trained using self-knowledge distillation and unified fine-tuning strategies to align its performance across all modes. It achieves state-of-the-art results in several multilingual and long-document retrieval benchmarks, surpassing models from OpenAI in certain tests. Designed to integrate with tools like Milvus and Vespa, BGE-M3 enables efficient hybrid retrieval pipelines and downstream scoring via re-ranking models.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
21

bge-small-en-v1.5

Compact English sentence embedding model for semantic search tasks

BAAI/bge-small-en-v1.5 is a lightweight English sentence embedding model developed by the Beijing Academy of Artificial Intelligence (BAAI) as part of the BGE (BAAI General Embedding) series. Designed for dense retrieval, semantic search, and similarity tasks, it produces 384-dimensional embeddings that can be used to compare and rank sentences or passages. This version (v1.5) improves similarity distribution, enhancing performance without the need for special query instructions. The model is optimized for speed and efficiency, making it suitable for resource-constrained environments. It is compatible with popular libraries such as FlagEmbedding, Sentence-Transformers, and Hugging Face Transformers. The model achieves competitive results on the MTEB benchmark, especially in retrieval and classification tasks. With only 33.4M parameters, it provides a strong balance of accuracy and performance for English-only use cases.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
22

blip-image-captioning-base

Image captioning model trained on COCO using BLIP base architecture

BLIP-Image-Captioning-Base is a pre-trained vision-language model developed by Salesforce that generates natural language descriptions of images. Built on the BLIP (Bootstrapping Language-Image Pretraining) framework, it uses a ViT-base backbone and is fine-tuned on the COCO dataset. The model supports both conditional and unconditional image captioning, delivering strong performance across multiple benchmarks including CIDEr and image-text retrieval. It introduces a novel strategy to bootstrap web-sourced noisy image-caption data using synthetic caption generation and noise filtering. BLIP's unified architecture is designed for both vision-language understanding and generation, showing strong generalization even in zero-shot settings. The model can be easily deployed using Hugging Face Transformers in PyTorch or TensorFlow, with support for GPU acceleration and half-precision inference.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
23

blip-image-captioning-large

Large BLIP model for high-quality, flexible image captioning tasks

blip-image-captioning-large is a vision-language model developed by Salesforce that generates image captions using a large ViT backbone. It is part of the BLIP framework, which unifies vision-language understanding and generation in a single model. The model is trained on the COCO dataset and leverages a bootstrapped captioning strategy using synthetic captions filtered for quality. This approach improves robustness across diverse vision-language tasks, including image captioning, retrieval, and VQA. BLIP-large achieves state-of-the-art performance on benchmarks like CIDEr and VQA accuracy. It supports both conditional and unconditional captioning and generalizes well to new tasks, such as video-language applications in zero-shot settings. With 470 million parameters, it offers a powerful, scalable solution for image-to-text generation.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
24

bloom

Multilingual 176B language model for text and code generation tasks

BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) is a 176-billion parameter autoregressive language model developed by the BigScience Workshop. It generates coherent text in 46 natural languages and 13 programming languages, making it one of the most multilingual LLMs publicly available. BLOOM was trained on 366 billion tokens using Megatron-DeepSpeed and large-scale computational resources. It can perform various tasks via prompt-based learning, even without task-specific fine-tuning, by framing them as text generation problems. Released under the BigScience RAIL license, BLOOM promotes responsible AI usage and open-access research. Though capable and flexible, the model has known limitations, including potential biases, hallucinations, and misuse if deployed without safeguards. Its training and evaluation were documented transparently, including metrics, ethical considerations, and its estimated carbon emissions.

Downloads: 0 This Week

Last Update: 2025-06-26
See Project
25

chatglm-6b

Bilingual 6.2B parameter chatbot optimized for Chinese and English

ChatGLM-6B is a 6.2 billion parameter bilingual language model developed by THUDM, based on the General Language Model (GLM) framework. It is optimized for natural and fluent dialogue in both Chinese and English, supporting applications in conversational AI, question answering, and assistance. Trained on approximately 1 trillion tokens, the model benefits from supervised fine-tuning, feedback self-training, and reinforcement learning with human feedback to align its outputs with human preferences. It is particularly tailored to Chinese dialogue scenarios while maintaining strong English capabilities. The model can be deployed locally on consumer-grade GPUs, requiring as little as 6GB of VRAM using INT4 quantization. ChatGLM-6B is open-source and free for academic and commercial use upon registration. It features easy-to-use APIs, sample chatbot interactions, and is backed by the GLM research family, with further upgrades available in newer versions like ChatGLM2.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project