Page 6 | Best Open Source Mac AI Models 2025

AI Models for Mac

View 108 business solutions

AI Models Mac Clear Filters

No-Nonsense Code-to-Cloud Security for Devs | Aikido
Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.

Start for Free
Your top-rated shield against malware and online scams | Avast Free Antivirus
Browse and email in peace, supported by clever AI

Our antivirus software scans for security and performance issues and helps you to fix them instantly. It also protects you in real time by analyzing unknown files before they reach your desktop PC or laptop — all for free.

Free Download
1

chronos-bolt-base

Fast, accurate zero-shot time series forecasting with T5 encoder

chronos-bolt-base is a zero-shot time series forecasting model developed by the AutoGluon team, built on the T5-efficient-base architecture with 205 million parameters. It is part of the Chronos-Bolt family, trained on nearly 100 billion time series observations. The model transforms time series data into sequence patches, allowing the encoder to process historical context while the decoder directly generates quantile forecasts for multiple future steps. It significantly improves inference speed and memory efficiency—being up to 600 times faster than Chronos-Large—while outperforming various deep learning and statistical forecasting models in accuracy, even in zero-shot settings. Chronos-Bolt is ideal for scalable forecasting tasks and is compatible with SageMaker and AutoGluon.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
2

chronos-bolt-small

Fast, efficient T5-based model for zero-shot time series forecasting

chronos-bolt-small is a compact, 48M-parameter model for zero-shot time series forecasting, developed by AutoGluon and based on the t5-efficient-small architecture. It leverages a patch-based encoder-decoder design to chunk historical data and generate direct multi-step quantile forecasts. Trained on nearly 100 billion time series observations, it delivers high accuracy while being up to 250× faster and 20× more memory-efficient than its predecessor, Chronos. The model excels at both probabilistic and point forecasting across diverse domains without prior exposure to target datasets. Benchmarking shows that even this small variant outperforms traditional statistical methods and many trained deep learning models. It is designed to be easily integrated into workflows using AutoGluon or deployed at scale on SageMaker. Chronos-Bolt is especially suitable for scalable forecasting tasks in production environments where speed, accuracy, and memory efficiency are critical.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
3

chronos-t5-small

Time series forecasting model using T5 architecture with 46M params

chronos-t5-small is part of Amazon’s Chronos family of time series forecasting models built on transformer-based language model architectures. It repurposes the T5 encoder-decoder design for time series data by transforming time series into discrete tokens via scaling and quantization. With 46 million parameters and a reduced vocabulary of 4096 tokens, this small variant balances performance with efficiency. Trained on both real-world and synthetic time series datasets, it supports probabilistic forecasting by autoregressively sampling multiple future trajectories. The model is capable of generating full predictive distributions, making it well-suited for uncertainty-aware forecasting. It is compatible with the Chronos Python package and integrates easily into forecasting pipelines using PyTorch. Chronos models are open-source under Apache 2.0 and have been demonstrated to perform competitively in forecasting benchmarks.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
4

clip-vit-base-patch16

Vision-language model for zero-shot image classification with CLIP

clip-vit-base-patch16 is a vision-language model by OpenAI designed for zero-shot image classification by aligning images and text in a shared embedding space. It uses a Vision Transformer (ViT-B/16) as the image encoder and a masked Transformer for text, trained with a contrastive loss on large-scale web-sourced (image, caption) pairs. The model can infer relationships between text and images without needing task-specific fine-tuning, enabling broad generalization across domains. It's commonly used in research to explore robustness, generalization, and semantic alignment across modalities. Despite strong benchmark results, CLIP struggles with tasks requiring fine-grained classification, object counting, and fairness across demographic groups. It has known biases influenced by data composition and class design, particularly with respect to race and gender. The model is not intended for deployment without careful in-domain testing and is unsuitable for surveillance or face recognition.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
Picsart Enterprise Background Removal API for Stunning eCommerce Visuals
Instantly remove the background from your images in just one click.

With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.

Learn More
5

clip-vit-base-patch32

Zero-shot image-text matching with ViT-B/32 Transformer encoder

clip-vit-base-patch32 is a variant of OpenAI's CLIP (Contrastive Language–Image Pretraining) model using a Vision Transformer base (ViT-B/32) for image encoding. It matches images and text by learning a shared embedding space and computing cosine similarity between (image, text) pairs. This model enables zero-shot classification by comparing an image to multiple text prompts without requiring task-specific training. Trained on a large, diverse corpus of internet image-text pairs, it supports multiple frameworks including PyTorch, TensorFlow, and JAX. It is primarily intended for research and robustness evaluation in computer vision, not for commercial deployment. Like other CLIP models, it performs well across a wide range of benchmarks but exhibits known limitations in fine-grained classification and demographic bias. Despite strong generalization, OpenAI discourages its use in facial recognition or unconstrained real-world applications without in-domain testing.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
6

clip-vit-large-patch14

Zero-shot image-text model for classification and similarity tasks

clip-vit-large-patch14 is a large Vision Transformer (ViT-L/14)–based model developed by OpenAI for zero-shot image classification and image-text similarity. It jointly embeds images and text using separate encoders, trained to align their representations through a contrastive loss. The model excels at generalizing to new visual tasks without fine-tuning by comparing similarity between image and text embeddings. Trained on a vast dataset of image-caption pairs from the internet, CLIP supports inference in PyTorch, TensorFlow, and JAX. Despite its versatility, CLIP is not recommended for real-world deployment without thorough testing due to known performance variability, bias, and fairness issues. It particularly struggles with fine-grained visual classification, object counting, and biased associations in demographic evaluations. Its primary purpose is research in robustness, generalization, and interdisciplinary applications across computer vision and language understanding.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
7

clip-vit-large-patch14-336

CLIP model for zero-shot image-text tasks using 336x336 patches

clip-vit-large-patch14-336 is a vision-language model developed by OpenAI as part of the CLIP (Contrastive Language–Image Pre-training) family. It uses a Vision Transformer (ViT) backbone with 14×14 patch size and 336×336 image resolution to learn joint representations of images and text. Though detailed training data is undisclosed, the model was trained from scratch and enables powerful zero-shot classification by aligning visual and textual features in the same embedding space. Users can apply this model to perform tasks like zero-shot image recognition, image search with text, or text generation from visual cues—without task-specific training.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
8

clipseg-rd64-refined

CLIP-based model for text-driven zero/one-shot image segmentation

CLIPSeg-RD64-Refined is a refined image segmentation model developed by CIDAS, based on the CLIP architecture. It enables zero-shot and one-shot segmentation by combining image and text prompts, allowing users to segment objects described in natural language. This refined version uses a reduced dimensionality of 64 (rd64) and a more complex convolutional refinement architecture to improve segmentation accuracy. The model was introduced in the paper Image Segmentation Using Text and Image Prompts by Lüddecke et al. and is released under the Apache-2.0 license. With a model size of 151 million parameters, it supports efficient deployment and is available in both I64 and F32 tensor types. CLIPSeg-RD64-Refined is designed for use with PyTorch and integrates well into workflows using Hugging Face Transformers. It can be applied across diverse domains such as medical imaging, robotics, and visual search, wherever precise, prompt-based segmentation is needed.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
9

colbertv2.0

Scalable BERT-based retrieval with late interaction for fast search

colbertv2.0 is a high-speed, high-accuracy retrieval model that enables scalable neural search over large text corpora using BERT-based embeddings. It introduces a “late interaction” mechanism where passages and queries are encoded into matrices of token-level embeddings. These are compared efficiently at search time using MaxSim operations, preserving contextual richness without sacrificing speed. Trained on datasets like MS MARCO, it significantly outperforms single-vector retrieval approaches and supports indexing and querying millions of documents with sub-second latency. ColBERTv2 builds on previous ColBERT versions with improved training, lightweight server options, and support for integration into end-to-end LLM pipelines.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
Gen AI apps are built with MongoDB Atlas
Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.

Start Free
10

csm-1b

CSM-1B is a speech generation model that creates realistic voice audio

CSM-1B (Conversational Speech Model) is a text-to-speech model developed by Sesame, designed to generate natural-sounding audio using text and audio prompts. Built on a LLaMA-based architecture and paired with a lightweight Mimi audio decoder, CSM-1B produces RVQ audio codes for realistic voice synthesis. It supports both single-sentence audio generation and full conversational modeling with contextual audio and text input. While not fine-tuned to mimic specific voices, it can create a wide range of synthetic speaker identities. It runs natively on Hugging Face Transformers (v4.52.1+) and supports batched inference, CUDA graph compilation, and fine-tuning with the standard Transformers Trainer. Though optimized for English, it has limited multilingual capabilities due to data overlap. CSM-1B is released under the Apache-2.0 license and includes strict ethical use guidelines prohibiting impersonation, misinformation, and other forms of misuse.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
11

deberta-v3-base

Improved DeBERTa model with ELECTRA-style pretraining

DeBERTa-v3-base is an enhanced version of Microsoft’s DeBERTa model, integrating ELECTRA-style pretraining and Gradient-Disentangled Embedding Sharing for improved performance. It builds upon the original DeBERTa's disentangled attention mechanism and enhanced mask decoder, enabling more effective representation learning than BERT or RoBERTa. The base version includes 12 layers, a hidden size of 768, and 86 million backbone parameters, with a 128K-token vocabulary contributing to 98M embedding parameters. DeBERTa-v3-base was trained on 160GB of text data, the same used for DeBERTa-v2, ensuring robust language understanding. It achieves state-of-the-art results on several NLU benchmarks, including SQuAD 2.0 and MNLI, outperforming prior models like RoBERTa-base and ELECTRA-base. The model is compatible with Hugging Face Transformers, PyTorch, TensorFlow, and Rust, and is widely used in text classification and fill-mask tasks.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
12

distilbert-base-uncased

Distilled version of BERT, optimized for speed and efficiency

distilbert-base-uncased is a compact, faster alternative to BERT developed through a distillation process. It retains 97% of BERT's language understanding performance while being 40% smaller and 60% faster. Trained on English Wikipedia and BookCorpus, it was distilled using BERT base as the teacher model with three objectives: distillation loss, masked language modeling (MLM), and cosine embedding loss. The model is uncased (treats "english" and "English" as the same) and is suitable for a wide range of downstream NLP tasks like sequence classification, token classification, or question answering. While efficient, it inherits biases present in the original BERT model. DistilBERT is available under the Apache 2.0 license and is compatible with PyTorch, TensorFlow, and JAX.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
13

distilbert-base-uncased-finetuned-sst-2

Sentiment analysis model fine-tuned on SST-2 with DistilBERT

distilbert-base-uncased-finetuned-sst-2-english is a lightweight sentiment classification model fine-tuned from DistilBERT on the SST-2 dataset. Developed by Hugging Face, it performs binary sentiment analysis (positive/negative) with high accuracy, achieving 91.3% on the dev set. It offers a smaller and faster alternative to BERT while retaining competitive performance (BERT scores ~92.7%). The model uses an uncased vocabulary and supports PyTorch, TensorFlow, ONNX, and Rust for broad deployment compatibility. With only 67 million parameters, it is ideal for real-time or resource-constrained sentiment analysis applications. Users can easily load and use the model via Hugging Face Transformers for classification tasks. Despite its efficiency, users should be aware of potential biases, as the model has shown inconsistent predictions based on sensitive content (e.g., country names) during evaluation.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
14

distilgpt2

DistilGPT2: Lightweight, distilled GPT-2 for faster text generation

DistilGPT2 is a smaller, faster, and lighter version of OpenAI’s GPT-2, distilled by Hugging Face using knowledge distillation techniques. With 82 million parameters, it retains most of GPT-2’s performance while significantly reducing size and computational requirements. It was trained on OpenWebText, a replication of OpenAI’s WebText dataset, using the same byte-level BPE tokenizer. The model excels in general-purpose English text generation and is well-suited for applications like autocompletion, creative writing, chatbots, and educational tools. It powers the Write With Transformers app and integrates seamlessly with the Hugging Face Transformers library. Although it performs well on benchmarks like WikiText-103, it is not designed for fact-sensitive or bias-critical use cases. It offers an ideal balance of speed, efficiency, and generative capability for developers and researchers working on lightweight NLP tasks.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
15

dolphin-2.9.1-yi-1.5-34b

Uncensored 34B model fine-tuned for conversation, code, and agents

dolphin-2.9.1-yi-1.5-34b is a 34B parameter large language model fine-tuned from Yi-1.5-34B by Cognitive Computations, with training led by Eric Hartford and collaborators. Built with Axolotl and using ChatML formatting, it supports 8k sequence lengths via RoPE theta scaling, surpassing its base 4k context limit. The model excels in instruction following, open-ended dialogue, coding, and early agentic behaviors, including function calling. It is intentionally uncensored and optimized for high compliance, with dataset filtering focused on removing alignment layers. While powerful and permissive, users are advised to implement their own safeguards before deploying it in public-facing applications.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
16

electra-base-discriminator

Transformer model trained to detect fake vs real tokens efficiently

electra-base-discriminator is a transformer model developed by Google that uses a novel pretraining method where the model learns to distinguish between real and fake tokens, instead of generating missing words like in BERT. This approach mimics the discriminator in a GAN architecture and significantly improves efficiency, requiring less computational power to achieve strong performance. The base version of ELECTRA contains 110M parameters and is particularly well-suited for tasks like classification, question answering (e.g., SQuAD), and sequence labeling. It can be fine-tuned for various downstream NLP tasks and supports multiple frameworks including PyTorch, TensorFlow, JAX, and Rust. ELECTRA models have demonstrated state-of-the-art performance on several benchmarks while training faster than comparable models.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
17

esm2_t30_150M_UR50D

Protein language model trained for sequence understanding and tasks

esm2_t30_150M_UR50D is a 150-million-parameter protein language model from Meta AI's ESM-2 family, trained using a masked language modeling objective on protein sequences. As part of the ESM-2 lineup, this model balances accuracy and resource efficiency, making it suitable for fine-tuning on a wide range of bioinformatics tasks such as protein classification, structure prediction, and token-level annotations. ESM-2 models are built with a transformer architecture and leverage large-scale unlabeled protein data. This particular checkpoint uses 30 layers and is available in both PyTorch and TensorFlow, facilitating integration into various protein modeling pipelines. It is designed to help researchers extract meaningful representations from protein sequences and accelerate downstream discoveries in computational biology.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
18

esm2_t36_3B_UR50D

3B parameter ESM-2 model for protein sequence understanding

esm2_t36_3B_UR50D is a large-scale protein language model from Meta AI’s ESM-2 family, trained using a masked language modeling objective on protein sequences. It features 36 transformer layers and 3 billion parameters, offering high accuracy for protein-related downstream tasks such as structure prediction, mutation effect modeling, or function classification. The model is part of Meta’s ESM-2 series, which improves over previous versions with better performance and scalability. It takes amino acid sequences as input and generates embeddings or masked predictions, enabling fine-tuning for specific biological applications. Larger checkpoints like this one tend to yield better performance but require more compute resources. The model is compatible with PyTorch and TensorFlow, and Meta provides demo notebooks to help with fine-tuning and application. Its capabilities support advanced bioinformatics research and computational biology workflows.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
19

fairface_age_image_detection

ViT-based model that estimates a person's age group from an image

fairface_age_image_detection is a Vision Transformer (ViT)-based model fine-tuned to classify the age group of a person in an image. Built on top of google/vit-base-patch16-224-in21k, the model was trained using the FairFace dataset, which provides a diverse set of facial images across multiple age categories. It predicts one of nine age groups, ranging from “0–2” up to “70+”. On evaluation over 10,000 samples, the model achieves an overall accuracy of approximately 59%, with best performance in the “3–9” and “0–2” categories. Though performance in older age brackets like “70+” is lower, it still captures broad demographic trends useful for research or pre-filtering tasks. The model is provided under the Apache 2.0 license and supports deployment via Hugging Face’s Inference API. Users can apply it for age estimation in images where fine-grained precision is not mission-critical.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
20

falcon-40b

Falcon-40B is a powerful open-source 40B parameter language model

Falcon-40B is a 40-billion-parameter, causal decoder-only language model developed by the Technology Innovation Institute (TII) and trained on 1 trillion tokens from the RefinedWeb dataset and curated corpora. Designed for high inference efficiency, it incorporates FlashAttention and multiquery attention for faster processing. Falcon-40B outperforms LLaMA, MPT, and other open-source models, making it one of the top-performing public LLMs. It supports English, German, Spanish, and French, with limited capabilities in several other European languages. Although powerful, Falcon-40B is a raw pretrained model and is best used after fine-tuning for specific applications such as summarization, chatbots, or content generation. It is released under the permissive Apache 2.0 license, allowing commercial use. The model requires significant hardware (85–100 GB VRAM) but offers state-of-the-art performance for large-scale NLP research and development.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
21

fashion-clip

CLIP model fine-tuned for zero-shot fashion product classification

FashionCLIP is a domain-adapted CLIP model fine-tuned specifically for the fashion industry, enabling zero-shot classification and retrieval of fashion products. Developed by Patrick John Chia and collaborators, it builds on the CLIP ViT-B/32 architecture and was trained on over 800K image-text pairs from the Farfetch dataset. The model learns to align product images and descriptive text using contrastive learning, enabling it to perform well across various fashion-related tasks without additional supervision. FashionCLIP 2.0, the latest version, uses the laion/CLIP-ViT-B-32-laion2B-s34B-b79K checkpoint for improved accuracy, achieving better F1 scores across multiple benchmarks compared to earlier versions. It supports multilingual fashion queries and works best with clean, product-style images against white backgrounds. The model can be used for product search, recommendation systems, or visual tagging in e-commerce platforms.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
22

gemma-7b

Compact, state-of-the-art LLM by Google for text generation tasks

Gemma-7B is a lightweight, open-source, decoder-only language model developed by Google, built using the same research and technology behind the Gemini family. With 8.5 billion parameters and an 8192-token context window, it is optimized for English text generation tasks like question answering, summarization, reasoning, and creative writing. Trained on 6 trillion tokens including web documents, code, and mathematical texts, Gemma-7B provides competitive performance across a wide range of NLP benchmarks. The model was trained using JAX and Google's ML Pathways on TPUv5e hardware, and supports deployment on CPUs, GPUs, and via quantization (int8/4bit) for efficient inference. Benchmark evaluations show it outperforms comparably sized open models in tasks measuring factuality, common sense, and code generation. Ethics evaluations demonstrate low levels of toxicity and bias, and Google provides responsible AI guidelines for safe usage.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
23

granite-timeseries-ttm-r2

Tiny pre-trained IBM model for multivariate time series forecasting

granite-timeseries-ttm-r2 is part of IBM’s TinyTimeMixers (TTM) series—compact, pre-trained models for multivariate time series forecasting. Unlike massive foundation models, TTM models are designed to be lightweight yet powerful, with only ~805K parameters, enabling high performance even on CPU or single-GPU machines. The r2 version is pre-trained on ~700M samples (r2.1 expands to ~1B), delivering up to 15% better accuracy than the r1 version. TTM supports both zero-shot and fine-tuned forecasting, handling minutely, hourly, daily, and weekly resolutions. It can integrate exogenous variables, static categorical features, and perform channel-mixing for richer multivariate forecasting. The get_model() utility makes it easy to auto-select the best TTM model for specific context and prediction lengths. These models significantly outperform benchmarks like Chronos, GPT4TS, and Moirai while demanding a fraction of the compute.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
24

grok-1

Grok-1 is a 314B-parameter open-weight language model by xAI

Grok-1 is a large-scale language model released by xAI, featuring 314 billion parameters and made available under the Apache 2.0 license. It is designed for text generation and was trained for advanced language understanding and reasoning capabilities. Grok-1 is currently distributed as open weights, with inference support requiring multi-GPU hardware due to its size. The model can be downloaded from Hugging Face and run using the accompanying Python code in the official GitHub repository. Though optimized for large-scale deployments, Grok-1 is intended for developers and researchers interested in high-capacity open models. The release aligns with xAI’s mission to promote openness in AI development while maintaining competitive performance in large language model benchmarks. While specific technical details about its architecture or training data remain limited, Grok-1 represents xAI’s entry into the open-weight LLM space.

Downloads: 0 This Week

Last Update: 2025-06-27
See Project
25

jina-embeddings-v3

Multilingual task-adaptive embeddings for 94 languages and NLP tasks

jina-embeddings-v3 is a multilingual, multi-task text embedding model developed by Jina AI, designed to generate highly adaptable representations across a wide range of natural language processing tasks. Built on a modified XLM-RoBERTa architecture with Rotary Position Embeddings (RoPE), it supports long inputs up to 8192 tokens. The model includes five task-specific LoRA adapters—covering retrieval, classification, clustering, and text matching—that allow users to optimize embeddings for different applications. Jina Embeddings v3 also supports Matryoshka embeddings, enabling users to select embedding sizes (32–1024) based on performance or resource needs. It performs well across 94 languages, with focused tuning on 30 languages including English, Chinese, Arabic, and Spanish. The model is compatible with Hugging Face Transformers, ONNX, and Sentence-Transformers libraries, and can be fine-tuned via LoRA adapters or fully trained.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project