Page 7 | Best Open Source Mac AI Models 2025

AI Models for Mac

View 108 business solutions

AI Models Mac Clear Filters

Gen AI apps are built with MongoDB Atlas
The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free
No-Nonsense Code-to-Cloud Security for Devs | Aikido
Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.

Start for Free
1

layoutlm-base-uncased

Multimodal Transformer for document image understanding and layout

layoutlm-base-uncased is a multimodal transformer model developed by Microsoft for document image understanding tasks. It incorporates both text and layout (position) features to effectively process structured documents like forms, invoices, and receipts. This base version has 113 million parameters and is pre-trained on 11 million documents from the IIT-CDIP dataset. LayoutLM enables better performance in tasks where the spatial arrangement of text plays a crucial role. The model uses a standard BERT-like architecture but enriches input with 2D positional embeddings. It achieves state-of-the-art results in form understanding and information extraction benchmarks. This model is particularly useful for document AI applications like document classification, question answering, and named entity recognition.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
2

mms-300m-1130-forced-aligner

CTC-based forced aligner for audio-text in 158 languages

mms-300m-1130-forced-aligner is a multilingual forced alignment model based on Meta’s MMS-300M wav2vec2 checkpoint, adapted for Hugging Face’s Transformers library. It supports forced alignment between audio and corresponding text across 158 languages, offering broad multilingual coverage. The model enables accurate word- or phoneme-level timestamping using Connectionist Temporal Classification (CTC) emissions. Unlike other tools, it provides significant memory efficiency compared to the TorchAudio forced alignment API. Users can integrate it easily through the Python package ctc-forced-aligner, and it supports GPU acceleration via PyTorch. The alignment pipeline includes audio processing, emission generation, tokenization, and span detection, making it suitable for speech analysis, transcription syncing, and dataset creation. This model is especially useful for researchers and developers working with low-resource languages or building multilingual speech systems.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
3

mobilenetv3_small_100.lamb_in1k

Lightweight MobileNetV3 image classifier trained on ImageNet-1k

mobilenetv3_small_100.lamb_in1k is a compact image classification model built on the MobileNetV3-Small architecture, trained using the LAMB optimizer on the ImageNet-1k dataset. Developed within the PyTorch Image Models (timm) library, it emphasizes high throughput and low compute overhead, making it ideal for edge devices or real-time inference. The model uses a training recipe inspired by "ResNet Strikes Back" with extended training duration, exponential learning rate decay, EMA weight averaging, and no CutMix. It achieves strong efficiency with just 2.5 million parameters and 0.1 GMACs while maintaining reasonable classification performance on 224x224 images. The model supports typical classification use, feature extraction, and embedding generation, with examples provided for all three. It is licensed under Apache 2.0 and designed to balance minimal compute cost with solid accuracy. Researchers and developers can easily integrate it using the timm library or Hugging Face.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
4

ms-marco-MiniLM-L6-v2

Efficient cross-encoder for MS MARCO passage re-ranking tasks

ms-marco-MiniLM-L6-v2 is a lightweight cross-encoder fine-tuned on the MS MARCO Passage Ranking dataset to deliver strong retrieval and reranking performance. It is based on a 6-layer MiniLM model and trained to directly score the relevance between query-passage pairs. The model outputs a single relevance score per pair and is used in re-ranking pipelines after an initial candidate set is retrieved (e.g., with BM25). Despite its compact 22.7M parameter size, it achieves an MRR@10 of 39.01 on MS MARCO Dev and an NDCG@10 of 74.30 on TREC DL 2019. It runs at around 1,800 query-doc pairs per second on a V100 GPU, offering a strong speed-to-accuracy tradeoff. The model is compatible with Hugging Face Transformers and SentenceTransformers, and supports deployment in ONNX and OpenVINO formats. Ideal for reranking in resource-constrained environments, it combines speed and quality for English-language passage ranking tasks.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
Powering the best of the internet | Fastly
Fastly's edge cloud platform delivers faster, safer, and more scalable sites and apps to customers.

Ensure your websites, applications and services can effortlessly handle the demands of your users with Fastly. Fastly’s portfolio is designed to be highly performant, personalized and secure while seamlessly scaling to support your growth.

Try for free
5

multilingual-e5-large

High-performance multilingual embedding model for 94 languages

multilingual-e5-large is a powerful sentence embedding model trained on a diverse set of multilingual datasets and fine-tuned for both symmetric and asymmetric text retrieval tasks. Based on xlm-roberta-large, the model generates 1024-dimensional embeddings across 94 languages, optimized for semantic search, clustering, bitext mining, and cross-lingual retrieval. It uses the "query:" and "passage:" prefix convention for improved performance and supports batch encoding through both Hugging Face Transformers and SentenceTransformers. Pretraining combined over 5 billion weakly supervised pairs from sources like mC4, Wikipedia, Reddit, and NLLB, followed by supervised fine-tuning on datasets like MS MARCO, SQuAD, and MIRACL. It achieves strong benchmark results, ranking at the top of MTEB and Mr. TyDi evaluations for many language-specific tasks. With robust support for sentence-transformers, ONNX, and OpenVINO, it integrates easily into production workflows.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
6

nsfw_image_detection

ViT-based model for detecting NSFW images with high accuracy

The nsfw_image_detection model by Falconsai is a fine-tuned Vision Transformer (ViT) designed to classify images as either "normal" or "nsfw" (not safe for work). Based on the vit-base-patch16-224-in21k architecture, it was initially pre-trained on the ImageNet-21k dataset and then fine-tuned using a curated proprietary dataset of 80,000 diverse images. The model achieved a strong evaluation accuracy of 98%, thanks to carefully tuned hyperparameters like a batch size of 16 and a learning rate of 5e-5. It is optimized for ethical content moderation and image safety filtering in digital platforms. The model can be used via the Hugging Face pipeline or loaded directly with PyTorch and Transformers for manual control. There is also an optional YOLOv9-based ONNX runtime script provided for inference in deployment scenarios. It is released under the Apache 2.0 license, allowing commercial use, with a strong emphasis on responsible implementation.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
7

openjourney

AI art model fine-tuned on Midjourney-style prompts for unique visuals

Openjourney is a fine-tuned version of Stable Diffusion v1.5, created by PromptHero, designed to replicate the distinctive visual style of Midjourney. It generates high-quality, imaginative text-to-image outputs when prompted with the phrase “mdjrny-v4 style.” Built on top of Stable Diffusion, it maintains the same architecture and parameters but introduces stylistic tuning through curated training on Midjourney-style images. Openjourney is compatible with Hugging Face’s diffusers library and can be exported to ONNX, MPS, and JAX/FLAX formats. It supports torch float16 precision for efficient GPU inference and produces results that lean toward fantasy, surrealism, and visually complex compositions. The model is available for free use via Hugging Face Spaces, and additional resources like prompt collections and a LoRA version are also provided. It is ideal for artists, designers, and AI enthusiasts seeking creative image generation with a Midjourney-inspired aesthetic.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
8

opt-125m

Compact GPT-style language model for open text generation and research

opt-125m is the smallest model in Meta AI’s OPT (Open Pre-trained Transformer) family—an open-source suite of decoder-only language models ranging from 125M to 175B parameters. It’s trained using causal language modeling (CLM), following similar architecture and objectives to GPT-3. The model was trained on 180B tokens from a diverse mix of datasets including BookCorpus, Common Crawl, Reddit, and more. OPT models aim to democratize access to large language models for responsible and reproducible research. This version, with 125M parameters, supports lightweight inference and is ideal for educational, prototyping, or constrained compute environments. It can be used for zero-shot tasks such as text generation and prompting, and is also fine-tuneable for downstream NLP applications. As with other large models trained on open internet data, OPT-125M carries biases and may produce toxic or hallucinated content.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
9

paraphrase-MiniLM-L6-v2

Lightweight sentence embedding model for semantic search

paraphrase-MiniLM-L6-v2 is a sentence-transformers model that encodes sentences and paragraphs into 384-dimensional dense vectors. It is specifically optimized for semantic similarity tasks such as paraphrase mining, clustering, and semantic search. The model is built on a lightweight MiniLM architecture, making it both fast and efficient for large-scale inference. It supports integration via both the sentence-transformers and transformers libraries, with built-in pooling strategies like mean pooling. Trained using contrastive learning, it provides a strong balance between speed and accuracy for English texts. This model is suitable for both academic and industrial use cases and has been widely adopted in various NLP applications. Released under the Apache-2.0 license, it is one of the most popular embedding models on Hugging Face.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
Your top-rated shield against malware and online scams | Avast Free Antivirus
Browse and email in peace, supported by clever AI

Our antivirus software scans for security and performance issues and helps you to fix them instantly. It also protects you in real time by analyzing unknown files before they reach your desktop PC or laptop — all for free.

Free Download
10

paraphrase-multilingual-MiniLM-L12-v2

Lightweight multilingual model for sentence similarity tasks

paraphrase-multilingual-MiniLM-L12-v2 is a compact sentence-transformers model that encodes sentences into 384-dimensional embeddings suitable for tasks such as semantic search, clustering, and paraphrase mining. Trained by the Sentence-Transformers team, it supports 50+ languages and builds on a distilled MiniLM architecture to balance speed and accuracy. The model uses mean pooling over token embeddings and is optimized for efficient inference, making it ideal for large-scale multilingual applications. It integrates seamlessly with both the Sentence-Transformers and Hugging Face Transformers libraries, offering a lightweight yet powerful solution for sentence-level understanding across diverse languages.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
11

paraphrase-multilingual-mpnet-base-v2

Multilingual sentence embeddings for search and similarity tasks

paraphrase-multilingual-mpnet-base-v2 is a sentence-transformers model designed to generate dense vector representations of sentences and paragraphs in 50 languages. Developed by the Sentence Transformers team, it is particularly well-suited for tasks like semantic search, clustering, and paraphrase detection. The model maps input text to a 768-dimensional vector space, making it easy to compare the semantic meaning of different sentences. Based on the XLM-RoBERTa architecture and trained using the MPNet framework, it offers multilingual support with strong performance across a wide range of languages. It can be used via the sentence-transformers library for streamlined access or directly through Hugging Face Transformers with custom pooling operations. The model is compatible with multiple formats, including PyTorch, TensorFlow, ONNX, and OpenVINO. With over 3 million downloads per month, it’s widely adopted in both research and production environments.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
12

phi-2

Small, high-performing language model for QA, chat, and code tasks

Phi-2 is a 2.7 billion parameter Transformer model developed by Microsoft, designed for natural language processing and code generation tasks. It was trained on a filtered dataset of high-quality web content and synthetic NLP texts created by GPT-3.5, totaling 1.4 trillion tokens. Phi-2 excels in benchmarks for common sense, language understanding, and logical reasoning, outperforming most models under 13B parameters despite not being instruction-tuned or aligned via RLHF. It performs best on QA-style prompts, code generation, and chat dialogues using structured input formats. The model has a context length of 2048 tokens and was trained over 14 days on 96 A100 GPUs using DeepSpeed and FlashAttention. Though compact, it still exhibits verbosity, potential bias, and may generate inaccurate or verbose code without supervision. Phi-2 is released under the MIT license to support open research on safe, controllable language modeling.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
13

r1-1776

R1 1776 is an uncensored reasoning-focused LLM fine-tuned by Perplexiy

R1 1776 is a post-trained version of the DeepSeek-R1 large language model, released by Perplexity AI with a focus on uncensored, factual, and high-reasoning output. It was specifically fine-tuned to remove censorship associated with the Chinese Communist Party while preserving strong reasoning abilities. The model was evaluated using a multilingual dataset covering over 1,000 sensitive topics, ensuring open engagement across controversial subjects. Human annotators and LLM-based evaluators confirmed that the decensoring process did not compromise performance. R1 1776 maintains parity with the original R1 model in benchmarks for math and reasoning. Licensed under MIT, the model is accessible for both research and commercial purposes. It is positioned as a transparent, high-performing alternative for users seeking unrestricted conversational AI.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
14

resnet18.a1_in1k

Lightweight ResNet-18 model trained on ImageNet with A1 recipe

resnet18.a1_in1k is a lightweight convolutional neural network from the timm library, implementing a ResNet-B variant trained on ImageNet-1K using the improved "ResNet Strikes Back" A1 training recipe. It features ReLU activations, a single 7x7 convolution with pooling, and 1x1 convolutional shortcuts for downsampling. With only 11.7 million parameters, it's designed to be efficient while maintaining strong baseline performance for image classification tasks. The model was optimized using the LAMB optimizer, a cosine learning rate schedule with warmup, and binary cross-entropy (BCE) loss. It achieves 73.16% Top-1 and 91.03% Top-5 accuracy at 288×288 image resolution. It's highly suited for feature extraction and embeddings as it supports classification-free forward passes and intermediate feature map access. This makes it a flexible backbone for real-time computer vision pipelines on resource-constrained devices.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
15

resnet50.a1_in1k

Zero-shot image-text classification with ViT-B/32 encoder.

clip-vit-base-patch32 is a zero-shot image classification model from OpenAI based on the CLIP (Contrastive Language–Image Pretraining) framework. It uses a Vision Transformer with base size and 32x32 patches (ViT-B/32) as the image encoder and a masked self-attention transformer as the text encoder. These components are jointly trained using contrastive loss to align images and text in a shared embedding space. The model excels in generalizing across tasks without additional fine-tuning by computing similarity between images and natural language prompts. Trained on a large corpus of image-text pairs sourced from the internet, CLIP enables flexible and interpretable image classification. While effective for research and robustness testing, OpenAI advises against commercial or surveillance use without domain-specific evaluations due to fairness, bias, and performance variability across class taxonomies.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
16

roberta-base

Robust BERT-based model for English with improved MLM training

roberta-base is a robustly optimized variant of BERT, pretrained on a significantly larger corpus of English text using dynamic masked language modeling. Developed by Facebook AI, RoBERTa improves on BERT by removing the Next Sentence Prediction objective, using longer training, larger batches, and more data, including BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories. It captures contextual representations of language by masking 15% of input tokens and predicting them. RoBERTa is designed to be fine-tuned for a wide range of NLP tasks such as classification, QA, and sequence labeling, achieving strong performance on the GLUE benchmark and other downstream applications.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
17

roberta-large

Large MLM-based English model optimized from BERT architecture

RoBERTa-large is a robustly optimized transformer model for English, trained by Facebook AI using a masked language modeling (MLM) objective. Unlike BERT, RoBERTa was trained on 160GB of data from BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories, with dynamic masking applied during training. It uses a byte-level BPE tokenizer and was trained with a sequence length of 512 and a batch size of 8K across 1024 V100 GPUs. RoBERTa improves performance across multiple NLP tasks by removing BERT’s next-sentence prediction objective and leveraging larger batches and longer training. With 355 million parameters, it learns bidirectional sentence representations and performs strongly in tasks like sequence classification, token classification, and question answering. However, it reflects social biases present in its training data, so caution is advised when deploying in sensitive contexts.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
18

sdxl-turbo

SDXL-Turbo is a real-time text-to-image model for high-quality output

SDXL-Turbo is a distilled version of SDXL 1.0 developed by Stability AI, optimized for real-time text-to-image generation using the Adversarial Diffusion Distillation (ADD) technique. It enables the synthesis of photorealistic images from text prompts in as little as one step, maintaining high fidelity and prompt adherence. The model bypasses traditional multi-step diffusion processes by distilling knowledge from large-scale diffusion models into a faster architecture. SDXL-Turbo supports both text-to-image and image-to-image generation, though it has limitations such as fixed 512×512 resolution, inability to render legible text, and imperfect human features. It performs especially well with simple prompts and stylized content, and is suitable for both non-commercial and commercial use under the SAI NC Community license. The model does not require guidance scale tuning and is best used without negative prompts.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
19

segmentation

Speaker segmentation model for voice activity and overlap detection

pyannote/segmentation is an advanced audio segmentation model designed for detecting speech activity, overlapping speech, and refining speaker diarization outputs. Built using pyannote.audio, it enables fine-grained, frame-level speaker segmentation from audio input. The model supports multiple pipelines such as Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), and Resegmentation. It outputs either labeled time segments or raw probability scores indicating speech presence. Based on work presented in Interspeech 2021, the model has been optimized for real-world datasets like AMI, DIHARD3, and VoxConverse. It is ideal for researchers and engineers developing speaker-aware audio processing systems and can be integrated via PyTorch in combination with Hugging Face tokens.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
20

segmentation-3.0

Speaker segmentation model for 10s audio chunks with powerset labels

segmentation-3.0 is a voice activity and speaker segmentation model from the pyannote.audio framework, designed to analyze 10-second mono audio sampled at 16kHz. It outputs a (num_frames, num_classes) matrix using a powerset encoding that includes non-speech, individual speakers, and overlapping speech for up to three speakers. Trained with pyannote.audio 3.0.0 on a rich blend of datasets—including AISHELL, DIHARD, VoxConverse, and more—it enables downstream tasks like voice activity detection (VAD), overlapped speech detection, and speaker diarization when combined with additional models. While it doesn't process full recordings directly, it powers pipelines for detailed segmentation and analysis of speech data. Its MIT license ensures it's openly accessible, though users must agree to usage conditions for access. The model showcases state-of-the-art segmentation performance and is used in both academic and production-oriented pipelines.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
21

siglip-so400m-patch14-384

SigLIP: Zero-shot image-text model with shape-optimized ViT

google/siglip-so400m-patch14-384 is a powerful zero-shot image classification model based on the SigLIP framework developed by Google. SigLIP introduces a new sigmoid contrastive loss, improving scalability and performance compared to traditional CLIP models. This specific variant uses a SoViT-400M architecture, a shape-optimized Vision Transformer trained on the large-scale WebLI dataset at 384×384 resolution. Unlike CLIP’s softmax-based loss, SigLIP’s sigmoid loss enables pairwise training without requiring normalization across the batch, improving flexibility for large-scale training. The model can handle tasks such as image-text retrieval and zero-shot classification by evaluating image-text similarity directly. It performs strongly on general visual understanding tasks and is accessible via the Hugging Face Transformers API. With over 878 million parameters, it is designed for high accuracy while remaining efficient in terms of memory and compute during inference.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
22

siglip2-so400m-patch16-naflex

Multilingual vision-language model for image-text understanding

SigLIP 2 So400m Patch16 NAFLEX is a powerful multilingual vision-language encoder developed by Google to improve zero-shot classification, semantic understanding, and dense feature extraction. It builds upon the SigLIP architecture by integrating advanced pretraining objectives such as decoder loss, global-local contrastive learning, masked prediction, and adaptability to different image aspect ratios and resolutions. Trained on the large-scale WebLI dataset, SigLIP 2 enables enhanced alignment between image and text modalities across languages. The model supports tasks such as zero-shot image classification, image-text retrieval, and general visual encoding for downstream applications. With 1.14 billion parameters and training done on 2048 TPU-v5e chips, it delivers strong performance in localization and semantic richness. It is compatible with Hugging Face Transformers and can be used directly via pipeline or with PyTorch-based workflows.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
23

speaker-diarization-3.1

Speaker diarization pipeline fully in PyTorch, no ONNX required

speaker-diarization-3.1 is a state-of-the-art speaker diarization pipeline built with pyannote.audio 3.1, fully implemented in PyTorch for easier deployment and faster inference by removing reliance on ONNX. It processes mono audio sampled at 16kHz (resampling and downmixing handled automatically) and outputs speaker annotations in RTTM format. The pipeline performs speaker segmentation and embedding, allowing for optional specification or estimation of the number of speakers. It supports GPU acceleration and in-memory waveform processing. Designed for fully automatic operation—no need for manual VAD, speaker count, or fine-tuning—the model has been benchmarked across multiple datasets like AMI, DIHARD, and VoxConverse using strict diarization error rate (DER) metrics. It demonstrates robust performance in realistic, overlapping, and noisy audio environments.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
24

stable-diffusion-2-1

Latent diffusion model for high-quality text-to-image generation

Stable Diffusion 2.1 is a text-to-image generation model developed by Stability AI, building on the 768-v architecture with additional fine-tuning for improved safety and image quality. It uses a latent diffusion framework that operates in a compressed image space, enabling faster and more efficient image synthesis while preserving detail. The model is conditioned on text prompts via the OpenCLIP-ViT/H encoder and supports generation at resolutions up to 768×768. Released under the OpenRAIL++ license, it permits research and commercial use with specific content restrictions. Stable Diffusion 2.1 is designed for creative tasks such as digital art, design prototyping, and educational tools, but is not suitable for generating factual representations or non-English content. The model was trained on filtered subsets of LAION-5B, with additional steps to reduce NSFW content.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
25

stable-diffusion-3-medium

Efficient text-to-image model with enhanced quality and typography

Stable Diffusion 3 Medium is a next-generation text-to-image model by Stability AI, designed using a Multimodal Diffusion Transformer (MMDiT) architecture. It offers notable improvements in image quality, prompt comprehension, typography, and computational efficiency over previous versions. The model integrates three fixed, pretrained text encoders—OpenCLIP-ViT/G, CLIP-ViT/L, and T5-XXL—to interpret complex prompts more effectively. Trained on 1 billion synthetic and filtered public images, it was fine-tuned on 30 million high-quality aesthetic images and 3 million preference-labeled samples. SD3 Medium is optimized for both local deployment and cloud API use, with support via ComfyUI, Diffusers, and other tooling. It is distributed under the Stability AI Community License, permitting research and commercial use for organizations under $1M in annual revenue. While equipped with safety mitigations, developers are encouraged to apply additional safeguards.

Downloads: 0 This Week

Last Update: 2025-06-26
See Project