Alternatives to Devstral

Compare Devstral alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Devstral in 2026. Compare features, ratings, user reviews, pricing, and more from Devstral competitors and alternatives in order to make an informed decision for your business.

  • 1
    DeepCoder

    DeepCoder

    Agentica Project

    DeepCoder is a fully open source code-reasoning and generation model released by Agentica Project in collaboration with Together AI. It is fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning, achieving a 60.6% accuracy on LiveCodeBench (representing an 8% improvement over the base), a performance level that matches that of proprietary models such as o3-mini (2025-01-031 Low) and o1 while using only 14 billion parameters. It was trained over 2.5 weeks on 32 H100 GPUs with a curated dataset of roughly 24,000 coding problems drawn from verified sources (including TACO-Verified, PrimeIntellect SYNTHETIC-1, and LiveCodeBench submissions), each problem requiring a verifiable solution and at least five unit tests to ensure reliability for RL training. To handle long-range context, DeepCoder employs techniques such as iterative context lengthening and overlong filtering.
    Starting Price: Free
  • 2
    DeepSWE

    DeepSWE

    Agentica Project

    DeepSWE is a fully open source, state-of-the-art coding agent built on top of the Qwen3-32B foundation model and trained exclusively via reinforcement learning (RL), without supervised finetuning or distillation from proprietary models. It is developed using rLLM, Agentica’s open source RL framework for language agents. DeepSWE operates as an agent; it interacts with a simulated development environment (via the R2E-Gym environment) using a suite of tools (file editor, search, shell-execution, submit/finish), enabling it to navigate codebases, edit multiple files, compile/run tests, and iteratively produce patches or complete engineering tasks. DeepSWE exhibits emergent behaviors beyond simple code generation; when presented with bugs or feature requests, the agent reasons about edge cases, seeks existing tests in the repository, proposes patches, writes extra tests for regressions, and dynamically adjusts its “thinking” effort.
    Starting Price: Free
  • 3
    Devstral Small 2
    Devstral Small 2 is the compact, 24 billion-parameter variant of the new coding-focused model family from Mistral AI, released under the permissive Apache 2.0 license to enable both local deployment and API use. Alongside its larger sibling (Devstral 2), this model brings “agentic coding” capabilities to environments with modest compute: it supports a large 256K-token context window, enabling it to understand and make changes across entire codebases. On the standard code-generation benchmark (SWE-Bench Verified), Devstral Small 2 scores around 68.0%, placing it among open-weight models many times its size. Because of its reduced size and efficient design, Devstral Small 2 can run on a single GPU or even CPU-only setups, making it practical for developers, small teams, or hobbyists without access to data-center hardware. Despite its compact footprint, Devstral Small 2 retains key capabilities of larger models; it can reason across multiple files and track dependencies.
    Starting Price: Free
  • 4
    Devstral 2

    Devstral 2

    Mistral AI

    Devstral 2 is a next-generation, open source agentic AI model tailored for software engineering: it doesn’t just suggest code snippets, it understands and acts across entire codebases, enabling multi-file edits, bug fixes, refactoring, dependency resolution, and context-aware code generation. The Devstral 2 family includes a large 123-billion-parameter model as well as a smaller 24-billion-parameter variant (“Devstral Small 2”), giving teams flexibility; the larger model excels in heavy-duty coding tasks requiring deep context, while the smaller one can run on more modest hardware. With a vast context window of up to 256 K tokens, Devstral 2 can reason across extensive repositories, track project history, and maintain a consistent understanding of lengthy files, an advantage for complex, real-world projects. The CLI tracks project metadata, Git statuses, and directory structure to give the model context, making “vibe-coding” more powerful.
    Starting Price: Free
  • 5
    Mistral Small 4
    Mistral Small 4 is an advanced open-source AI model developed by Mistral AI that combines reasoning, coding, and multimodal capabilities into a single system. It unifies the strengths of previous models such as Magistral for reasoning, Pixtral for multimodal processing, and Devstral for agentic coding tasks. The model can handle both text and image inputs, allowing it to perform tasks ranging from conversational chat to visual analysis and document understanding. Built with a mixture-of-experts architecture, Mistral Small 4 delivers efficient performance while scaling to complex workloads. It also features a configurable reasoning parameter that allows users to switch between fast responses and deeper analytical outputs. With a large context window and optimized inference performance, the model supports long-form interactions and complex workflows.
    Starting Price: Free
  • 6
    Mistral Small 3.1
    ​Mistral Small 3.1 is a state-of-the-art, multimodal, and multilingual AI model released under the Apache 2.0 license. Building upon Mistral Small 3, this enhanced version offers improved text performance, and advanced multimodal understanding, and supports an expanded context window of up to 128,000 tokens. It outperforms comparable models like Gemma 3 and GPT-4o Mini, delivering inference speeds of 150 tokens per second. Designed for versatility, Mistral Small 3.1 excels in tasks such as instruction following, conversational assistance, image understanding, and function calling, making it suitable for both enterprise and consumer-grade AI applications. Its lightweight architecture allows it to run efficiently on a single RTX 4090 or a Mac with 32GB RAM, facilitating on-device deployments. It is available for download on Hugging Face, accessible via Mistral AI's developer playground, and integrated into platforms like Google Cloud Vertex AI, with availability on NVIDIA NIM and
    Starting Price: Free
  • 7
    Mistral Large 3
    Mistral Large 3 is a next-generation, open multimodal AI model built with a powerful sparse Mixture-of-Experts architecture featuring 41B active parameters out of 675B total. Designed from scratch on NVIDIA H200 GPUs, it delivers frontier-level reasoning, multilingual performance, and advanced image understanding while remaining fully open-weight under the Apache 2.0 license. The model achieves top-tier results on modern instruction benchmarks, positioning it among the strongest permissively licensed foundation models available today. With native support across vLLM, TensorRT-LLM, and major cloud providers, Mistral Large 3 offers exceptional accessibility and performance efficiency. Its design enables enterprise-grade customization, letting teams fine-tune or adapt the model for domain-specific workflows and proprietary applications. Mistral Large 3 represents a major advancement in open AI, offering frontier intelligence without sacrificing transparency or control.
    Starting Price: Free
  • 8
    Mistral 7B

    Mistral 7B

    Mistral AI

    Mistral 7B is a 7.3-billion-parameter language model that outperforms larger models like Llama 2 13B across various benchmarks. It employs Grouped-Query Attention (GQA) for faster inference and Sliding Window Attention (SWA) to efficiently handle longer sequences. Released under the Apache 2.0 license, Mistral 7B is accessible for deployment across diverse platforms, including local environments and major cloud services. Additionally, a fine-tuned version, Mistral 7B Instruct, demonstrates enhanced performance in instruction-following tasks, surpassing models like Llama 2 13B Chat.
    Starting Price: Free
  • 9
    Mistral NeMo

    Mistral NeMo

    Mistral AI

    Mistral NeMo, our new best small model. A state-of-the-art 12B model with 128k context length, and released under the Apache 2.0 license. Mistral NeMo is a 12B model built in collaboration with NVIDIA. Mistral NeMo offers a large context window of up to 128k tokens. Its reasoning, world knowledge, and coding accuracy are state-of-the-art in its size category. As it relies on standard architecture, Mistral NeMo is easy to use and a drop-in replacement in any system using Mistral 7B. We have released pre-trained base and instruction-tuned checkpoints under the Apache 2.0 license to promote adoption for researchers and enterprises. Mistral NeMo was trained with quantization awareness, enabling FP8 inference without any performance loss. The model is designed for global, multilingual applications. It is trained on function calling and has a large context window. Compared to Mistral 7B, it is much better at following precise instructions, reasoning, and handling multi-turn conversations.
    Starting Price: Free
  • 10
    Voxtral

    Voxtral

    Mistral AI

    Voxtral models are frontier open source speech‑understanding systems available in two sizes—a 24 B variant for production‑scale applications and a 3 B variant for local and edge deployments, both released under the Apache 2.0 license. They combine high‑accuracy transcription with native semantic understanding, supporting long‑form context (up to 32 K tokens), built‑in Q&A and structured summarization, automatic language detection across major languages, and direct function‑calling to trigger backend workflows from voice. Retaining the text capabilities of their Mistral Small 3.1 backbone, Voxtral handles audio up to 30 minutes for transcription or 40 minutes for understanding and outperforms leading open source and proprietary models on benchmarks such as LibriSpeech, Mozilla Common Voice, and FLEURS. Accessible via download on Hugging Face, API endpoint, or private on‑premises deployment, Voxtral also offers domain‑specific fine‑tuning and advanced enterprise features.
  • 11
    Pixtral Large

    Pixtral Large

    Mistral AI

    Pixtral Large is a 124-billion-parameter open-weight multimodal model developed by Mistral AI, building upon their Mistral Large 2 architecture. It integrates a 123-billion-parameter multimodal decoder with a 1-billion-parameter vision encoder, enabling advanced understanding of documents, charts, and natural images while maintaining leading text comprehension capabilities. With a context window of 128,000 tokens, Pixtral Large can process at least 30 high-resolution images simultaneously. The model has demonstrated state-of-the-art performance on benchmarks such as MathVista, DocVQA, and VQAv2, surpassing models like GPT-4o and Gemini-1.5 Pro. Pixtral Large is available under the Mistral Research License for research and educational use, and under the Mistral Commercial License for commercial applications.
    Starting Price: Free
  • 12
    Ministral 3

    Ministral 3

    Mistral AI

    Mistral 3 is the latest generation of open-weight AI models from Mistral AI, offering a full family of models, from small, edge-optimized versions to a flagship, large-scale multimodal model. The lineup includes three compact “Ministral 3” models (3B, 8B, and 14B parameters) designed for efficiency and deployment on constrained hardware (even laptops, drones, or edge devices), plus the powerful “Mistral Large 3,” a sparse mixture-of-experts model with 675 billion total parameters (41 billion active). The models support multimodal and multilingual tasks, not only text, but also image understanding, and have demonstrated best-in-class performance on general prompts, multilingual conversations, and multimodal inputs. The base and instruction-fine-tuned versions are released under the Apache 2.0 license, enabling broad customization and integration in enterprise and open source projects.
    Starting Price: Free
  • 13
    Solar Mini

    Solar Mini

    Upstage AI

    Solar Mini is a pre‑trained large language model that delivers GPT‑3.5‑comparable responses with 2.5× faster inference while staying under 30 billion parameters. It achieved first place on the Hugging Face Open LLM Leaderboard in December 2023 by combining a 32‑layer Llama 2 architecture, initialized with high‑quality Mistral 7B weights, with an innovative “depth up‑scaling” (DUS) approach that deepens the model efficiently without adding complex modules. After DUS, continued pretraining restores and enhances performance, and instruction tuning in a QA format, especially for Korean, refines its ability to follow user prompts, while alignment tuning ensures its outputs meet human or advanced AI preferences. Solar Mini outperforms competitors such as Llama 2, Mistral 7B, Ko‑Alpaca, and KULLM across a variety of benchmarks, proving that compact size need not sacrifice capability.
    Starting Price: $0.1 per 1M tokens
  • 14
    Ministral 3B

    Ministral 3B

    Mistral AI

    Mistral AI introduced two state-of-the-art models for on-device computing and edge use cases, named "les Ministraux": Ministral 3B and Ministral 8B. These models set a new frontier in knowledge, commonsense reasoning, function-calling, and efficiency in the sub-10B category. They can be used or tuned for various applications, from orchestrating agentic workflows to creating specialist task workers. Both models support up to 128k context length (currently 32k on vLLM), and Ministral 8B features a special interleaved sliding-window attention pattern for faster and memory-efficient inference. These models were built to provide a compute-efficient and low-latency solution for scenarios such as on-device translation, internet-less smart assistants, local analytics, and autonomous robotics. Used in conjunction with larger language models like Mistral Large, les Ministraux also serve as efficient intermediaries for function-calling in multi-step agentic workflows.
    Starting Price: Free
  • 15
    Codestral Embed
    Codestral Embed is Mistral AI's first embedding model, specialized for code, optimized for high-performance code retrieval and semantic understanding. It significantly outperforms leading code embedders in the market today, such as Voyage Code 3, Cohere Embed v4.0, and OpenAI’s large embedding model. Codestral Embed can output embeddings with different dimensions and precisions; for instance, with a dimension of 256 and int8 precision, it still performs better than any model from competitors. The dimensions of the embeddings are ordered by relevance, allowing users to choose the first n dimensions for a smooth trade-off between quality and cost. It excels in retrieval use cases on real-world code data, particularly in benchmarks like SWE-Bench, which is based on real-world GitHub issues and corresponding fixes, and Text2Code (GitHub), relevant for providing context for code completion or editing.
  • 16
    LFM2.5

    LFM2.5

    Liquid AI

    Liquid AI’s LFM2.5 is the next generation of on-device AI foundation models designed to deliver high-performance, efficient AI inference on edge devices such as phones, laptops, vehicles, IoT systems, and embedded hardware without relying on cloud compute. It extends the previous LFM2 architecture by significantly increasing the pretraining scale and reinforcement learning stages, yielding a family of hybrid models around 1.2 billion parameters that balance instruction following, reasoning, and multimodal capabilities for real-world agentic use cases. The LFM2.5 family includes Base (for fine-tuning and customization), Instruct (general-purpose instruction-tuned), Japanese-optimized, Vision-Language, and Audio-Language variants, all optimized for fast, on-device inference under tight memory constraints and available as open-weight models deployable via frameworks like llama.cpp, MLX, vLLM, and ONNX.
    Starting Price: Free
  • 17
    Mistral Large 2
    Mistral AI has launched the Mistral Large 2, an advanced AI model designed to excel in code generation, multilingual capabilities, and complex reasoning tasks. The model features a 128k context window, supporting dozens of languages including English, French, Spanish, and Arabic, as well as over 80 programming languages. Mistral Large 2 is tailored for high-throughput single-node inference, making it ideal for large-context applications. Its improved performance on benchmarks like MMLU and its enhanced code generation and reasoning abilities ensure accuracy and efficiency. The model also incorporates better function calling and retrieval, supporting complex business applications.
    Starting Price: Free
  • 18
    OpenPipe

    OpenPipe

    OpenPipe

    OpenPipe provides fine-tuning for developers. Keep your datasets, models, and evaluations all in one place. Train new models with the click of a button. Automatically record LLM requests and responses. Create datasets from your captured data. Train multiple base models on the same dataset. We serve your model on our managed endpoints that scale to millions of requests. Write evaluations and compare model outputs side by side. Change a couple of lines of code, and you're good to go. Simply replace your Python or Javascript OpenAI SDK and add an OpenPipe API key. Make your data searchable with custom tags. Small specialized models cost much less to run than large multipurpose LLMs. Replace prompts with models in minutes, not weeks. Fine-tuned Mistral and Llama 2 models consistently outperform GPT-4-1106-Turbo, at a fraction of the cost. We're open-source, and so are many of the base models we use. Own your own weights when you fine-tune Mistral and Llama 2, and download them at any time.
    Starting Price: $1.20 per 1M tokens
  • 19
    NativeMind

    NativeMind

    NativeMind

    NativeMind is an open source, on-device AI assistant that runs entirely in your browser via Ollama integration, ensuring absolute privacy by never sending data to the cloud. Everything, from model inference to prompt processing, occurs locally, so there’s no syncing, logging, or data leakage. Users can load and switch between powerful open models such as DeepSeek, Qwen, Llama, Gemma, and Mistral instantly, without additional setup, and leverage native browser features for streamlined workflows. NativeMind offers clean, concise webpage summarization; persistent, context-aware chat across multiple tabs; local web search that retrieves and answers queries directly within the page; and immersive, format-preserving translation of entire pages. Built for speed and security, the extension is fully auditable and community-backed, delivering enterprise-grade performance for real-world use cases without vendor lock-in or hidden telemetry.
    Starting Price: Free
  • 20
    bolt.diy

    bolt.diy

    bolt.diy

    bolt.diy is an open-source platform that enables developers to easily create, run, edit, and deploy full-stack web applications with a variety of large language models (LLMs). It supports a wide range of models, including OpenAI, Anthropic, Ollama, OpenRouter, Gemini, LMStudio, Mistral, xAI, HuggingFace, DeepSeek, and Groq. The platform offers seamless integration through the Vercel AI SDK, allowing users to customize and extend their applications with the LLMs of their choice. With its intuitive interface, bolt.diy is designed to simplify AI development workflows, making it a great tool for both experimentation and production-ready applications.
  • 21
    Qwen2.5-Max
    Qwen2.5-Max is a large-scale Mixture-of-Experts (MoE) model developed by the Qwen team, pretrained on over 20 trillion tokens and further refined through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF). In evaluations, it outperforms models like DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while also demonstrating competitive results in other assessments, including MMLU-Pro. Qwen2.5-Max is accessible via API through Alibaba Cloud and can be explored interactively on Qwen Chat.
    Starting Price: Free
  • 22
    Falcon Mamba 7B

    Falcon Mamba 7B

    Technology Innovation Institute (TII)

    Falcon Mamba 7B is the first open-source State Space Language Model (SSLM), introducing a groundbreaking architecture for Falcon models. Recognized as the top-performing open-source SSLM worldwide by Hugging Face, it sets a new benchmark in AI efficiency. Unlike traditional transformers, SSLMs operate with minimal memory requirements and can generate extended text sequences without additional overhead. Falcon Mamba 7B surpasses leading transformer-based models, including Meta’s Llama 3.1 8B and Mistral’s 7B, showcasing superior performance. This innovation underscores Abu Dhabi’s commitment to advancing AI research and development on a global scale.
    Starting Price: Free
  • 23
    EXAONE Deep
    EXAONE Deep is a series of reasoning-enhanced language models developed by LG AI Research, featuring parameter sizes of 2.4 billion, 7.8 billion, and 32 billion. These models demonstrate superior capabilities in various reasoning tasks, including math and coding benchmarks. Notably, EXAONE Deep 2.4B outperforms other models of comparable size, EXAONE Deep 7.8B surpasses both open-weight models of similar scale and the proprietary reasoning model OpenAI o1-mini, and EXAONE Deep 32B shows competitive performance against leading open-weight models. The repository provides comprehensive documentation covering performance evaluations, quickstart guides for using EXAONE Deep models with Transformers, explanations of quantized EXAONE Deep weights in AWQ and GGUF formats, and instructions for running EXAONE Deep models locally using frameworks like llama.cpp and Ollama.
    Starting Price: Free
  • 24
    Llama 2
    The next generation of our open source large language model. This release includes model weights and starting code for pretrained and fine-tuned Llama language models — ranging from 7B to 70B parameters. Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1. Its fine-tuned models have been trained on over 1 million human annotations. Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests. Llama 2 was pretrained on publicly available online data sources. The fine-tuned model, Llama-2-chat, leverages publicly available instruction datasets and over 1 million human annotations. We have a broad range of supporters around the world who believe in our open approach to today’s AI — companies that have given early feedback and are excited to build with Llama 2.
    Starting Price: Free
  • 25
    GPT-5.2-Codex
    GPT-5.2-Codex is OpenAI’s most advanced agentic coding model, built for complex, real-world software engineering and defensive cybersecurity work. It is a specialized version of GPT-5.2 optimized for long-horizon coding tasks such as large refactors, migrations, and feature development. The model maintains full context over extended sessions through native context compaction. GPT-5.2-Codex delivers state-of-the-art performance on benchmarks like SWE-Bench Pro and Terminal-Bench 2.0. It operates reliably across large repositories and native Windows environments. Stronger vision capabilities allow it to interpret screenshots, diagrams, and UI designs during development. GPT-5.2-Codex is designed to be a dependable partner for professional engineering workflows.
  • 26
    Microsoft Foundry Models
    Microsoft Foundry Models is a unified model catalog that gives enterprises access to more than 11,000 AI models from Microsoft, OpenAI, Anthropic, Mistral AI, Meta, Cohere, DeepSeek, xAI, and others. It allows teams to explore, test, and deploy models quickly using a task-centric discovery experience and integrated playground. Organizations can fine-tune models with ready-to-use pipelines and evaluate performance using their own datasets for more accurate benchmarking. Foundry Models provides secure, scalable deployment options with serverless and managed compute choices tailored to enterprise needs. With built-in governance, compliance, and Azure’s global security framework, businesses can safely operationalize AI across mission-critical workflows. The platform accelerates innovation by enabling developers to build, iterate, and scale AI solutions from one centralized environment.
  • 27
    NLP Cloud

    NLP Cloud

    NLP Cloud

    Fast and accurate AI models suited for production. Highly-available inference API leveraging the most advanced NVIDIA GPUs. We selected the best open-source natural language processing (NLP) models from the community and deployed them for you. Fine-tune your own models - including GPT-J - or upload your in-house custom models, and deploy them easily to production. Upload or Train/Fine-Tune your own AI models - including GPT-J - from your dashboard, and use them straight away in production without worrying about deployment considerations like RAM usage, high-availability, scalability... You can upload and deploy as many models as you want to production.
    Starting Price: $29 per month
  • 28
    Mixtral 8x7B

    Mixtral 8x7B

    Mistral AI

    Mixtral 8x7B is a high-quality sparse mixture of experts model (SMoE) with open weights. Licensed under Apache 2.0. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT-3.5 on most standard benchmarks.
    Starting Price: Free
  • 29
    Mistral Small

    Mistral Small

    Mistral AI

    On September 17, 2024, Mistral AI announced several key updates to enhance the accessibility and performance of their AI offerings. They introduced a free tier on "La Plateforme," their serverless platform for tuning and deploying Mistral models as API endpoints, enabling developers to experiment and prototype at no cost. Additionally, Mistral AI reduced prices across their entire model lineup, with significant cuts such as a 50% reduction for Mistral Nemo and an 80% decrease for Mistral Small and Codestral, making advanced AI more cost-effective for users. The company also unveiled Mistral Small v24.09, a 22-billion-parameter model offering a balance between performance and efficiency, suitable for tasks like translation, summarization, and sentiment analysis. Furthermore, they made Pixtral 12B, a vision-capable model with image understanding capabilities, freely available on "Le Chat," allowing users to analyze and caption images without compromising text-based performance.
    Starting Price: Free
  • 30
    CodeNext

    CodeNext

    CodeNext

    CodeNext.ai is an AI-powered coding assistant designed specifically for Xcode developers, offering context-aware code completion and agentic chat functionalities. It supports a wide range of leading AI models, including OpenAI, Azure OpenAI, Google AI, Mistral, Anthropic, Deepseek, Ollama, and more, providing developers with the flexibility to choose and switch between models as needed. It delivers intelligent, real-time code suggestions as you type, enhancing productivity and coding efficiency. Its agentic chat feature allows developers to interact in natural language to write code, fix bugs, refactor, and perform various coding tasks within or beyond the codebase. CodeNext.ai includes custom chat plugins that enable the execution of terminal commands and shortcuts directly within the chat interface, streamlining the development workflow.
    Starting Price: $15 per month
  • 31
    Mistral Medium 3
    Mistral Medium 3 is a powerful AI model designed to deliver state-of-the-art performance at a fraction of the cost compared to other models. It offers simpler deployment options, allowing for hybrid or on-premises configurations. Mistral Medium 3 excels in professional applications like coding and multimodal understanding, making it ideal for enterprise use. Its low-cost structure makes it highly accessible while maintaining top-tier performance, outperforming many larger models in specific domains.
    Starting Price: Free
  • 32
    Mistral Large

    Mistral Large

    Mistral AI

    Mistral Large is Mistral AI's flagship language model, designed for advanced text generation and complex multilingual reasoning tasks, including text comprehension, transformation, and code generation. It supports English, French, Spanish, German, and Italian, offering a nuanced understanding of grammar and cultural contexts. With a 32,000-token context window, it can accurately recall information from extensive documents. The model's precise instruction-following and native function-calling capabilities facilitate application development and tech stack modernization. Mistral Large is accessible through Mistral's platform, Azure AI Studio, and Azure Machine Learning, and can be self-deployed for sensitive use cases. Benchmark evaluations indicate that Mistral Large achieves strong results, making it the world's second-ranked model generally available through an API, next to GPT-4.
    Starting Price: Free
  • 33
    Claude Opus 4.5
    Claude Opus 4.5 is Anthropic’s newest flagship model, delivering major improvements in reasoning, coding, agentic workflows, and real-world problem solving. It outperforms previous models and leading competitors on benchmarks such as SWE-bench, multilingual coding tests, and advanced agent evaluations. Opus 4.5 also introduces stronger safety features, including significantly higher resistance to prompt injection and improved alignment across sensitive tasks. Developers gain new controls through the Claude API—like effort parameters, context compaction, and advanced tool use—allowing for more efficient, longer-running agentic workflows. Product updates across Claude, Claude Code, the Chrome extension, and Excel integrations expand how users interact with the model for software engineering, research, and everyday productivity. Overall, Claude Opus 4.5 marks a substantial step forward in capability, reliability, and usability for developers, enterprises, and end users.
  • 34
    Tülu 3
    Tülu 3 is an advanced instruction-following language model developed by the Allen Institute for AI (Ai2), designed to enhance capabilities in areas such as knowledge, reasoning, mathematics, coding, and safety. Built upon the Llama 3 Base, Tülu 3 employs a comprehensive four-stage post-training process: meticulous prompt curation and synthesis, supervised fine-tuning on a diverse set of prompts and completions, preference tuning using both off- and on-policy data, and a novel reinforcement learning approach to bolster specific skills with verifiable rewards. This open-source model distinguishes itself by providing full transparency, including access to training data, code, and evaluation tools, thereby closing the performance gap between open and proprietary fine-tuning methods. Evaluations indicate that Tülu 3 outperforms other open-weight models of similar size, such as Llama 3.1-Instruct and Qwen2.5-Instruct, across various benchmarks.
    Starting Price: Free
  • 35
    Mistral Medium 3.1
    Mistral Medium 3.1 is the latest frontier-class multimodal foundation model released in August 2025, designed to deliver advanced reasoning, coding, and multimodal capabilities while dramatically reducing deployment complexity and costs. It builds on the highly efficient architecture of Mistral Medium 3, renowned for offering state-of-the-art performance at up to 8-times lower cost than leading large models, enhancing tone consistency, responsiveness, and accuracy across diverse tasks and modalities. The model supports deployment across hybrid environments, on-premises systems, and virtual private clouds, and it achieves competitive performance relative to high-end models such as Claude Sonnet 3.7, Llama 4 Maverick, and Cohere Command A. Ideal for professional and enterprise use cases, Mistral Medium 3.1 excels in coding, STEM reasoning, language understanding, and multimodal comprehension, while maintaining broad compatibility with custom workflows and infrastructure.
  • 36
    Magistral

    Magistral

    Mistral AI

    Magistral is Mistral AI’s first reasoning‑focused language model family, released in two sizes: Magistral Small, a 24 B‑parameter open‑weight model under Apache 2.0 (downloadable on Hugging Face), and Magistral Medium, a more capable enterprise version available via Mistral’s API, Le Chat platform, and major cloud marketplaces. Built for domain‑specific, transparent, multilingual reasoning across tasks like math, physics, structured calculations, programmatic logic, decision trees, and rule‑based systems, Magistral produces chain‑of‑thought outputs in the user’s language that you can follow and verify. This launch marks a shift toward compact yet powerful transparent AI reasoning. Magistral Medium is currently available in preview on Le Chat, the API, SageMaker, WatsonX, Azure AI, and Google Cloud Marketplace. Magistral is ideal for general-purpose use requiring longer thought processing and better accuracy than with non-reasoning LLMs.
  • 37
    Leanstral

    Leanstral

    Mistral AI

    Leanstral is an open-source code agent developed by Mistral AI specifically designed to work with the Lean 4 proof assistant. The model focuses on generating code while also formally verifying its correctness against strict mathematical or software specifications. Unlike traditional coding assistants, Leanstral integrates directly with formal proof systems to ensure that generated code satisfies defined logical requirements. Its architecture is optimized for proof engineering tasks and operates efficiently with sparse model parameters. Leanstral is released under the Apache 2.0 license, making it freely accessible for developers, researchers, and organizations to use and customize. The model is designed to operate within real-world formal repositories rather than isolated problem environments. By combining code generation with formal verification, Leanstral aims to reduce the need for manual human review in complex software and mathematical development.
    Starting Price: Free
  • 38
    Mathstral

    Mathstral

    Mistral AI

    As a tribute to Archimedes, whose 2311th anniversary we’re celebrating this year, we are proud to release our first Mathstral model, a specific 7B model designed for math reasoning and scientific discovery. The model has a 32k context window published under the Apache 2.0 license. We’re contributing Mathstral to the science community to bolster efforts in advanced mathematical problems requiring complex, multi-step logical reasoning. The Mathstral release is part of our broader effort to support academic projects, it was produced in the context of our collaboration with Project Numina. Akin to Isaac Newton in his time, Mathstral stands on the shoulders of Mistral 7B and specializes in STEM subjects. It achieves state-of-the-art reasoning capacities in its size category across various industry-standard benchmarks. In particular, it achieves 56.6% on MATH and 63.47% on MMLU, with the following MMLU performance difference by subject between Mathstral 7B and Mistral 7B.
    Starting Price: Free
  • 39
    Mistral OCR 3

    Mistral OCR 3

    Mistral AI

    Mistral OCR 3 is the third-generation optical character recognition model from Mistral AI designed to achieve a new frontier in accuracy and efficiency for document processing by extracting text, embedded images, and structure from a wide range of documents with exceptional fidelity. It delivers breakthrough performance with a 74% overall win rate over the previous generation on forms, scanned documents, complex tables, and handwriting, outperforming both enterprise document processing solutions and AI-native OCR tools. OCR 3 supports output in clean text, Markdown, or structured JSON with HTML table reconstruction to preserve layout, enabling downstream systems and workflows to understand both content and structure. It powers the Document AI Playground in Mistral AI Studio for drag-and-drop parsing of PDFs and images and integrates via API for developers to automate document extraction workflows.
    Starting Price: $14.99 per month
  • 40
    Kimi K2

    Kimi K2

    Moonshot AI

    Kimi K2 is a state-of-the-art open source large language model series built on a mixture-of-experts (MoE) architecture, featuring 1 trillion total parameters and 32 billion activated parameters for task-specific efficiency. Trained with the Muon optimizer on over 15.5 trillion tokens and stabilized by MuonClip’s attention-logit clamping, it delivers exceptional performance in frontier knowledge, reasoning, mathematics, coding, and general agentic workflows. Moonshot AI provides two variants, Kimi-K2-Base for research-level fine-tuning and Kimi-K2-Instruct pre-trained for immediate chat and tool-driven interactions, enabling both custom development and drop-in agentic capabilities. Benchmarks show it outperforms leading open source peers and rivals top proprietary models in coding tasks and complex task breakdowns, while its 128 K-token context length, tool-calling API compatibility, and support for industry-standard inference engines.
    Starting Price: Free
  • 41
    Solar Pro 2

    Solar Pro 2

    Upstage AI

    Solar Pro 2 is Upstage’s latest frontier‑scale large language model, designed to power complex tasks and agent‑like workflows across domains such as finance, healthcare, and legal. Packaged in a compact 31 billion‑parameter architecture, it delivers top‑tier multilingual performance, especially in Korean, where it outperforms much larger models on benchmarks like Ko‑MMLU, Hae‑Rae, and Ko‑IFEval, while also excelling in English and Japanese. Beyond superior language understanding and generation, Solar Pro 2 offers next‑level intelligence through an advanced Reasoning Mode that significantly boosts multi‑step task accuracy on challenges ranging from general reasoning (MMLU, MMLU‑Pro, HumanEval) to complex mathematics (Math500, AIME) and software engineering (SWE‑Bench Agentless), achieving problem‑solving efficiency comparable to or exceeding that of models twice its size. Enhanced tool‑use capabilities enable the model to interact seamlessly with external APIs and data sources.
    Starting Price: $0.1 per 1M tokens
  • 42
    GLM-4.7

    GLM-4.7

    Zhipu AI

    GLM-4.7 is an advanced large language model designed to significantly elevate coding, reasoning, and agentic task performance. It delivers major improvements over GLM-4.6 in multilingual coding, terminal-based tasks, and real-world software engineering benchmarks such as SWE-bench and Terminal Bench. GLM-4.7 supports “thinking before acting,” enabling more stable, accurate, and controllable behavior in complex coding and agent workflows. The model also introduces strong gains in UI and frontend generation, producing cleaner webpages, better layouts, and more polished slides. Enhanced tool-using capabilities allow GLM-4.7 to perform more effectively in web browsing, automation, and agent benchmarks. Its reasoning and mathematical performance has improved substantially, showing strong results on advanced evaluation suites. GLM-4.7 is available via Z.ai, API platforms, coding agents, and local deployment for flexible adoption.
    Starting Price: Free
  • 43
    Ministral 8B

    Ministral 8B

    Mistral AI

    Mistral AI has introduced two advanced models for on-device computing and edge applications, named "les Ministraux": Ministral 3B and Ministral 8B. These models excel in knowledge, commonsense reasoning, function-calling, and efficiency within the sub-10B parameter range. They support up to 128k context length and are designed for various applications, including on-device translation, offline smart assistants, local analytics, and autonomous robotics. Ministral 8B features an interleaved sliding-window attention pattern for faster and more memory-efficient inference. Both models can function as intermediaries in multi-step agentic workflows, handling tasks like input parsing, task routing, and API calls based on user intent with low latency and cost. Benchmark evaluations indicate that les Ministraux consistently outperforms comparable models across multiple tasks. As of October 16, 2024, both models are available, with Ministral 8B priced at $0.1 per million tokens.
    Starting Price: Free
  • 44
    xPrivo

    xPrivo

    xPrivo

    A free, open-source AI chat alternative to ChatGPT and Perplexity that prioritizes your privacy and anonymity. No account required – not even for PRO features. All chats are stored locally on your device and never logged or used for training. Key Features: - 100% Anonymous | Zero personal data collection - EU-hosted models - GDPR-compliant servers running Mistral 3, DeepSeek V3.2, and other powerful open-source models behind the default xprivo model - Web search with sources. Get fact-checked, current information - Self-hostable. Run it on your own infrastructure or use the hosted version - BYOK support. Connect your own API keys from OpenAI, Anthropic, Grok, etc. - Local-first. Your chat history never leaves your device - Open source. Fully auditable code on GitHub - Use it with ollama to chat with your local models fully offline Perfect for privacy-conscious users who want powerful AI assistance without compromising their anonymity.
  • 45
    Falcon-40B

    Falcon-40B

    Technology Innovation Institute (TII)

    Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora. It is made available under the Apache 2.0 license. Why use Falcon-40B? It is the best open-source model currently available. Falcon-40B outperforms LLaMA, StableLM, RedPajama, MPT, etc. See the OpenLLM Leaderboard. It features an architecture optimized for inference, with FlashAttention and multiquery. It is made available under a permissive Apache 2.0 license allowing for commercial use, without any royalties or restrictions. ⚠️ This is a raw, pretrained model, which should be further finetuned for most usecases. If you are looking for a version better suited to taking generic instructions in a chat format, we recommend taking a look at Falcon-40B-Instruct.
    Starting Price: Free
  • 46
    Trooper.AI

    Trooper.AI

    Trooper.AI

    Trooper.AI lets you rent private, bare-metal GPU servers for AI training, inference, and experimentation — ready in minutes. Instantly deploy OpenWebUI, ComfyUI, Jupyter Notebook, Ubuntu Desktop, Ollama, and more with one click. No shared GPUs, no containers, full root access included. All servers are EU-hosted, GDPR and EU AI Act compliant, and operated from Germany. Trooper.AI is built on up-cycled high-end hardware, combining strong performance with sustainability. Pause or freeze servers anytime to save costs and pay only for what you use. Choose from a wide range of GPUs, from V100 and RTX 3090 to RTX 4090 and RTX Pro 6000 Blackwell, backed by fast NVMe storage, persistent machine state, automatic backups, and simple UI and API management. Trooper.AI is the smallest hyperscaler in Europe — built for developers who want performance, privacy, and full control without cloud complexity.
    Starting Price: €149/month
  • 47
    Chinchilla

    Chinchilla

    Google DeepMind

    Chinchilla is a large language model. Chinchilla uses the same compute budget as Gopher but with 70B parameters and 4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher.
  • 48
    Claude Opus 4.6
    Claude Opus 4.6 is Anthropic’s flagship AI model designed to push the boundaries of reasoning, coding, and real-world problem solving. It delivers significant performance gains over previous versions and competing models across key benchmarks. Opus 4.6 excels on SWE-bench, multilingual coding evaluations, and advanced agent-based tests. The model is built to support complex, long-running agentic workflows with greater efficiency. Enhanced safety measures improve resistance to prompt injection and strengthen alignment on sensitive tasks. Developers benefit from new API controls such as effort parameters, context compaction, and advanced tool usage. These improvements make Opus 4.6 more powerful, reliable, and versatile across use cases.
  • 49
    Phi-4-reasoning
    Phi-4-reasoning is a 14-billion parameter transformer-based language model optimized for complex reasoning tasks, including math, coding, algorithmic problem solving, and planning. Trained via supervised fine-tuning of Phi-4 on carefully curated "teachable" prompts and reasoning demonstrations generated using o3-mini, it generates detailed reasoning chains that effectively leverage inference-time compute. Phi-4-reasoning incorporates outcome-based reinforcement learning to produce longer reasoning traces. It outperforms significantly larger open-weight models such as DeepSeek-R1-Distill-Llama-70B and approaches the performance levels of the full DeepSeek-R1 model across a wide range of reasoning tasks. Phi-4-reasoning is designed for environments with constrained computing or latency. Fine-tuned with synthetic data generated by DeepSeek-R1, it provides high-quality, step-by-step problem solving.
  • 50
    LongLLaMA

    LongLLaMA

    LongLLaMA

    This repository contains the research preview of LongLLaMA, a large language model capable of handling long contexts of 256k tokens or even more. LongLLaMA is built upon the foundation of OpenLLaMA and fine-tuned using the Focused Transformer (FoT) method. LongLLaMA code is built upon the foundation of Code Llama. We release a smaller 3B base variant (not instruction tuned) of the LongLLaMA model on a permissive license (Apache 2.0) and inference code supporting longer contexts on hugging face. Our model weights can serve as the drop-in replacement of LLaMA in existing implementations (for short context up to 2048 tokens). Additionally, we provide evaluation results and comparisons against the original OpenLLaMA models.
    Starting Price: Free