Alternatives to DecentAI
Compare DecentAI alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to DecentAI in 2026. Compare features, ratings, user reviews, pricing, and more from DecentAI competitors and alternatives in order to make an informed decision for your business.
-
1
Vertex AI
Google
Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery using standard SQL queries on existing business intelligence tools and spreadsheets, or you can export datasets from BigQuery directly into Vertex AI Workbench and run your models from there. Use Vertex Data Labeling to generate highly accurate labels for your data collection. Vertex AI Agent Builder enables developers to create and deploy enterprise-grade generative AI applications. It offers both no-code and code-first approaches, allowing users to build AI agents using natural language instructions or by leveraging frameworks like LangChain and LlamaIndex. -
2
DecenterAds
DecenterAds
DecenterAds is the demand side platform for direct advertisers and agencies. DecenterAds, a cutting-edge infrastructure that can be customised for programmatic companies, is designed for efficient RTB trading. The product connects advertisers and publishers worldwide at all possible locations according to IAB Standards. DecenterAds offers a cost-effective solution that provides powerful billing support and transparent, real-time reporting. The platform offers Machine Learning Technology to build a digital advertising business within the programmatic ecosystem. -
3
fullmoon
fullmoon
Fullmoon is a free, open source application that enables users to interact with large language models directly on their devices, ensuring privacy and offline accessibility. Optimized for Apple silicon, it operates seamlessly across iOS, iPadOS, macOS, and visionOS platforms. Users can personalize the app by adjusting themes, fonts, and system prompts, and it integrates with Apple's Shortcuts for enhanced functionality. Fullmoon supports models like Llama-3.2-1B-Instruct-4bit and Llama-3.2-3B-Instruct-4bit, facilitating efficient on-device AI interactions without the need for an internet connection.Starting Price: Free -
4
CloudSight API
CloudSight
Image recognition technology that provides true understanding of your digital media. With our on-device computer vision model, users can expect an average response time of less than 250ms. This is more than 4x faster than using our API and does not require an internet connection. Users can recognize objects in a space by simply scanning their phone around a room, eliminating the need to take individual pictures. This feature is unique to our on-device model. By removing the need for data to leave the end-user device, privacy concerns are virtually eliminated. While our API takes every precaution possible to protect your privacy and data, our on-device model raises the bar on security substantially. Send CloudSight your visual content, and our API will generate a natural language description in response. Filter and categorize images, monitor for inappropriate content, and automatically assign labels for all of your digital media. -
5
Azure AI Custom Vision
Microsoft
Create a custom computer vision model in minutes. Customize and embed state-of-the-art computer vision image analysis for specific domains with AI Custom Vision, part of Azure AI Services. Build frictionless customer experiences, optimize manufacturing processes, accelerate digital marketing campaigns, and more. No machine learning expertise is required. Set your model to perceive a particular object for your use case. Easily build your image identifier model using the simple interface. Start training your computer vision model by simply uploading and labeling a few images. The model tests itself on these and continually improves precision through a feedback loop as you add images. To speed development, use customizable, built-in models for retail, manufacturing, and food. See how Minsur, one of the world's largest tin mines, uses AI Custom Vision for sustainable mining. Rely on enterprise-grade security and privacy for your data and any trained models.Starting Price: $2 per 1,000 transactions -
6
Ailiverse NeuCore
Ailiverse
Build & scale with ease. With NeuCore you can develop, train and deploy your computer vision model in a few minutes and scale it to millions. A one-stop platform that manages the model lifecycle, including development, training, deployment, and maintenance. Advanced data encryption is applied to protect your information at all stages of the process, from training to inference. Fully integrable vision AI models fit into your existing workflows and systems, or even edge devices easily. Seamless scalability accommodates your growing business needs and evolving business requirements. Divides an image into segments of different objects within the image. Extracts text from images, making it machine-readable. This model also works on handwriting. With NeuCore, building computer vision models is as easy as drag-and-drop and one-click. For more customization, advanced users can access provided code scripts and follow tutorial videos. -
7
Pipeshift
Pipeshift
Pipeshift is a modular orchestration platform designed to facilitate the building, deployment, and scaling of open source AI components, including embeddings, vector databases, large language models, vision models, and audio models, across any cloud environment or on-premises infrastructure. The platform offers end-to-end orchestration, ensuring seamless integration and management of AI workloads, and is 100% cloud-agnostic, providing flexibility in deployment. With enterprise-grade security, Pipeshift addresses the needs of DevOps and MLOps teams aiming to establish production pipelines in-house, moving beyond experimental API providers that may lack privacy considerations. Key features include an enterprise MLOps console for managing various AI workloads such as fine-tuning, distillation, and deployment; multi-cloud orchestration with built-in auto-scalers, load balancers, and schedulers for AI models; and Kubernetes cluster management. -
8
Moondream
Moondream
Moondream is an open source vision language model designed for efficient image understanding across various devices, including servers, PCs, mobile phones, and edge devices. It offers two primary variants, Moondream 2B, a 1.9-billion-parameter model providing robust performance for general-purpose tasks, and Moondream 0.5B, a compact 500-million-parameter model optimized for resource-constrained hardware. Both models support quantization formats like fp16, int8, and int4, allowing for reduced memory usage without significant performance loss. Moondream's capabilities include generating detailed image captions, answering visual queries, performing object detection, and pinpointing specific items within images. Its design emphasizes versatility and accessibility, enabling deployment across a wide range of platforms. Starting Price: Free -
9
IBM Maximo Visual Inspection puts the power of computer vision AI capabilities into the hands of your quality control and inspection teams. It makes computer vision, deep learning, and automation more accessible to your technicians as it’s an intuitive toolset for labeling, training, and deploying artificial intelligence vision models. Built for easy and rapid deployment, simply train your model using our drag-and-drop visual user interface or import a custom model, and you’re ready to activate when and where you need it using mobile and edge devices. With IBM Maximo Visual Inspection, you can create your own detect and correct solution, with self-learning machine algorithms. Watch the demo below to understand how easy it is to automate your inspection processes with visual inspection tools.
-
10
Palmyra LLM
Writer
Palmyra is a suite of Large Language Models (LLMs) engineered for precise, dependable performance in enterprise applications. These models excel in tasks such as question-answering, image analysis, and support for over 30 languages, with fine-tuning available for industries like healthcare and finance. Notably, Palmyra models have achieved top rankings in benchmarks like Stanford HELM and PubMedQA, and Palmyra-Fin is the first model to pass the CFA Level III exam. Writer ensures data privacy by not using client data to train or modify their models, adopting a zero data retention policy. The Palmyra family includes specialized models such as Palmyra X 004, featuring tool-calling capabilities; Palmyra Med, tailored for healthcare; Palmyra Fin, designed for finance; and Palmyra Vision, which offers advanced image and video processing. These models are available through Writer's full-stack generative AI platform, which integrates graph-based Retrieval Augmented Generation (RAG).Starting Price: $18 per month -
11
Rupert AI
Rupert AI
Rupert AI envisions a world where marketing is not just about reaching audiences but engaging them in the most personalized and effective way. Our AI-driven solutions are designed to make this vision a reality for businesses of all sizes. Key Features - AI model training: You can train your vision model, an object, style or a character. - AI workflows: Multiple AI workflows for marketing and creative material creation. Benefits of AI Model Training - Custom Solutions: Train models to recognize specific objects, styles, or characters that match your needs. - Higher Accuracy: Get better results tailored to your unique requirements. - Versatility: Useful for different industries like design, marketing, and gaming. - Faster Prototyping: Quickly test new ideas and concepts. - Brand Differentiation: Build unique visual styles and assets that stand out.Starting Price: $10/month -
12
Eyewey
Eyewey
Train your own models, get access to pre-trained computer vision models and app templates, learn how to create AI apps or solve a business problem using computer vision in a couple of hours. Start creating your own dataset for detection by adding the images of the object you need to train. You can add up to 5000 images per dataset. After images are added to your dataset, they are pushed automatically into training. Once the model is finished training, you will be notified accordingly. You can simply download your model to be used for detection. You can also integrate your model to our pre-existing app templates for quick coding. Our mobile app which is available on both Android and IOS utilizes the power of computer vision to help people with complete blindness in their day-to-day lives. It is capable of alerting hazardous objects or signs, detecting common objects, recognizing text as well as currencies and understanding basic scenarios through deep learning.Starting Price: $6.67 per month -
13
Roboflow
Roboflow
Roboflow has everything you need to build and deploy computer vision models. Connect Roboflow at any step in your pipeline with APIs and SDKs, or use the end-to-end interface to automate the entire process from image to inference. Whether you’re in need of data labeling, model training, or model deployment, Roboflow gives you building blocks to bring custom computer vision solutions to your business.Starting Price: $250/month -
14
Ray2
Luma AI
Ray2 is a large-scale video generative model capable of creating realistic visuals with natural, coherent motion. It has a strong understanding of text instructions and can take images and video as input. Ray2 exhibits advanced capabilities as a result of being trained on Luma’s new multi-modal architecture scaled to 10x compute of Ray1. Ray2 marks the beginning of a new generation of video models capable of producing fast coherent motion, ultra-realistic details, and logical event sequences. This increases the success rate of usable generations and makes videos generated by Ray2 substantially more production-ready. Text-to-video generation is available in Ray2 now, with image-to-video, video-to-video, and editing capabilities coming soon. Ray2 brings a whole new level of motion fidelity. Smooth, cinematic, and jaw-dropping, transform your vision into reality. Tell your story with stunning, cinematic visuals. Ray2 lets you craft breathtaking scenes with precise camera movements.Starting Price: $9.99 per month -
15
Qwen2.5-VL
Alibaba
Qwen2.5-VL is the latest vision-language model from the Qwen series, representing a significant advancement over its predecessor, Qwen2-VL. This model excels in visual understanding, capable of recognizing a wide array of objects, including text, charts, icons, graphics, and layouts within images. It functions as a visual agent, capable of reasoning and dynamically directing tools, enabling applications such as computer and phone usage. Qwen2.5-VL can comprehend videos exceeding one hour in length and can pinpoint relevant segments within them. Additionally, it accurately localizes objects in images by generating bounding boxes or points and provides stable JSON outputs for coordinates and attributes. The model also supports structured outputs for data like scanned invoices, forms, and tables, benefiting sectors such as finance and commerce. Available in base and instruct versions across 3B, 7B, and 72B sizes, Qwen2.5-VL is accessible through platforms like Hugging Face and ModelScope.Starting Price: Free -
16
GPT-4o mini
OpenAI
A small model with superior textual intelligence and multimodal reasoning. GPT-4o mini enables a broad range of tasks with its low cost and latency, such as applications that chain or parallelize multiple model calls (e.g., calling multiple APIs), pass a large volume of context to the model (e.g., full code base or conversation history), or interact with customers through fast, real-time text responses (e.g., customer support chatbots). Today, GPT-4o mini supports text and vision in the API, with support for text, image, video and audio inputs and outputs coming in the future. The model has a context window of 128K tokens, supports up to 16K output tokens per request, and has knowledge up to October 2023. Thanks to the improved tokenizer shared with GPT-4o, handling non-English text is now even more cost effective. -
17
Mistral Small
Mistral AI
On September 17, 2024, Mistral AI announced several key updates to enhance the accessibility and performance of their AI offerings. They introduced a free tier on "La Plateforme," their serverless platform for tuning and deploying Mistral models as API endpoints, enabling developers to experiment and prototype at no cost. Additionally, Mistral AI reduced prices across their entire model lineup, with significant cuts such as a 50% reduction for Mistral Nemo and an 80% decrease for Mistral Small and Codestral, making advanced AI more cost-effective for users. The company also unveiled Mistral Small v24.09, a 22-billion-parameter model offering a balance between performance and efficiency, suitable for tasks like translation, summarization, and sentiment analysis. Furthermore, they made Pixtral 12B, a vision-capable model with image understanding capabilities, freely available on "Le Chat," allowing users to analyze and caption images without compromising text-based performance.Starting Price: Free -
18
PaliGemma 2
Google
PaliGemma 2, the next evolution in tunable vision-language models, builds upon the performant Gemma 2 models, adding the power of vision and making it easier than ever to fine-tune for exceptional performance. With PaliGemma 2, these models can see, understand, and interact with visual input, opening up a world of new possibilities. It offers scalable performance with multiple model sizes (3B, 10B, 28B parameters) and resolutions (224px, 448px, 896px). PaliGemma 2 generates detailed, contextually relevant captions for images, going beyond simple object identification to describe actions, emotions, and the overall narrative of the scene. Our research demonstrates leading performance in chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation, as detailed in the technical report. Upgrading to PaliGemma 2 is a breeze for existing PaliGemma users. -
19
GPT-4o
OpenAI
GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time (opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.Starting Price: $5.00 / 1M tokens -
20
Claude Haiku 3
Anthropic
Claude Haiku 3 is the fastest and most affordable model in its intelligence class. With state-of-the-art vision capabilities and strong performance on industry benchmarks, Haiku is a versatile solution for a wide range of enterprise applications. The model is now available alongside Sonnet and Opus in the Claude API and on claude.ai for our Claude Pro subscribers. -
21
AI Verse
AI Verse
When real-life data capture is challenging, we generate diverse, fully labeled image datasets. Our procedural technology ensures the highest quality, unbiased, labeled synthetic datasets that will improve your computer vision model’s accuracy. AI Verse empowers users with full control over scene parameters, ensuring you can fine-tune the environments for unlimited image generation, giving you an edge in the competitive landscape of computer vision development. -
22
Manot
Manot
Your insight management platform for computer vision model performance. Pinpoint precisely where, how, and why models fail, bridging the gap between product managers and engineers through actionable insights. Manot provides an automated and continuous feedback loop for product managers to effectively communicate with engineering teams. Manot's simple user interface allows both technical and non-technical team members to benefit from the platform. Manot is designed with product managers in mind. Our platform provides actionable insights in the form of images pinpointing how, where, and why your model will perform poorly. -
23
Florence-2
Microsoft
Florence-2-large is an advanced vision foundation model developed by Microsoft, capable of handling a wide variety of vision and vision-language tasks, such as captioning, object detection, segmentation, and OCR. Built with a sequence-to-sequence architecture, it uses the FLD-5B dataset containing over 5 billion annotations and 126 million images to master multi-task learning. Florence-2-large excels in both zero-shot and fine-tuned settings, providing high-quality results with minimal training. The model supports tasks including detailed captioning, object detection, and dense region captioning, and can process images with text prompts to generate relevant responses. It offers great flexibility by handling diverse vision-related tasks through prompt-based approaches, making it a competitive tool in AI-powered visual tasks. The model is available on Hugging Face with pre-trained weights, enabling users to quickly get started with image processing and task execution.Starting Price: Free -
24
Hive Data
Hive
Create training datasets for computer vision models with our fully managed solution. We believe that data labeling is the most important factor in building effective deep learning models. We are committed to being the field's leading data labeling platform and helping companies take full advantage of AI's capabilities. Organize your media with discrete categories. Identify items of interest with one or many bounding boxes. Like bounding boxes, but with additional precision. Annotate objects with accurate width, depth, and height. Classify each pixel of an image. Mark individual points in an image. Annotate straight lines in an image. Measure, yaw, pitch, and roll of an item of interest. Annotate timestamps in video and audio content. Annotate freeform lines in an image.Starting Price: $25 per 1,000 annotations -
25
Qwen2-VL
Alibaba
Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model familities. Compared with Qwen-VL, Qwen2-VL has the capabilities of: SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20 min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside imagesStarting Price: Free -
26
Azure AI Content Safety
Microsoft
Azure AI Content Safety is a content moderation platform that uses AI to keep your content safe. Create better online experiences for everyone with powerful AI models that detect offensive or inappropriate content in text and images quickly and efficiently. Language models analyze multilingual text, in both short and long form, with an understanding of context and semantics. Vision models perform image recognition and detect objects in images using state-of-the-art Florence technology. AI content classifiers identify sexual, violent, hate, and self-harm content with high levels of granularity. Content moderation severity scores indicate the level of content risk on a scale of low to high. -
27
Azure AI Services
Microsoft
Build cutting-edge, market-ready AI applications with out-of-the-box and customizable APIs and models. Quickly infuse generative AI into production workloads using studios, SDKs, and APIs. Gain a competitive edge by building AI apps powered by foundation models, including those from OpenAI, Meta, and Microsoft. Detect and mitigate harmful use with built-in responsible AI, enterprise-grade Azure security, and responsible AI tooling. Build your own copilot and generative AI applications with cutting-edge language and vision models. Retrieve the most relevant data using keyword, vector, and hybrid search. Monitor text and images to detect offensive or inappropriate content. Translate documents and text in real time across more than 100 languages. -
28
AskUI
AskUI
AskUI is an innovative platform that enables AI agents to visually perceive and interact with any computer interface, facilitating seamless automation across various operating systems and applications. Leveraging advanced vision models, AskUI's PTA-1 prompt-to-action model allows users to execute AI-driven actions on Windows, macOS, Linux, and mobile devices without the need for jailbreaking. This technology is particularly beneficial for tasks such as desktop and mobile automation, visual testing, and document or data processing. By integrating with tools like Jira, Jenkins, GitLab, and Docker, AskUI enhances workflow efficiency and reduces the burden on developers. Companies like Deutsche Bahn have reported significant improvements in internal processes, citing over a 90% increase in efficiency through the use of AskUI's test automation capabilities. -
29
Cloneable
Cloneable
Cloneable packs sophisticated logic into an incredibly easy-to-use, no-code builder to develop custom, deep-tech applications compatible with any device. Cloneable integrates deep tech with your unique business logic, so you can create and deploy tailored apps to any edge device. Apps can be built in minutes, making it perfect for non-technical audiences to make instant process changes and for engineers who want to rapidly develop and iterate on complex field tools. Launch, update and test your AI and computer vision models on any device (phone, IoT, cloud, robot). Apps are instantly deployable from the Cloneable builder. Bring your own model or build from one of our templates to move any data collection process to the edge. Cloneable was built with unlimited flexibility, so you can count, measure, inspect, and track assets across any location. Intelligent apps can digitize manual processes, scale human expertise, increase transparency, improve auditability, and much more. -
30
LLaVA
LLaVA
LLaVA (Large Language-and-Vision Assistant) is an innovative multimodal model that integrates a vision encoder with the Vicuna language model to facilitate comprehensive visual and language understanding. Through end-to-end training, LLaVA exhibits impressive chat capabilities, emulating the multimodal functionalities of models like GPT-4. Notably, LLaVA-1.5 has achieved state-of-the-art performance across 11 benchmarks, utilizing publicly available data and completing training in approximately one day on a single 8-A100 node, surpassing methods that rely on billion-scale datasets. The development of LLaVA involved the creation of a multimodal instruction-following dataset, generated using language-only GPT-4. This dataset comprises 158,000 unique language-image instruction-following samples, including conversations, detailed descriptions, and complex reasoning tasks. This data has been instrumental in training LLaVA to perform a wide array of visual and language tasks effectively.Starting Price: Free -
31
Pixtral Large
Mistral AI
Pixtral Large is a 124-billion-parameter open-weight multimodal model developed by Mistral AI, building upon their Mistral Large 2 architecture. It integrates a 123-billion-parameter multimodal decoder with a 1-billion-parameter vision encoder, enabling advanced understanding of documents, charts, and natural images while maintaining leading text comprehension capabilities. With a context window of 128,000 tokens, Pixtral Large can process at least 30 high-resolution images simultaneously. The model has demonstrated state-of-the-art performance on benchmarks such as MathVista, DocVQA, and VQAv2, surpassing models like GPT-4o and Gemini-1.5 Pro. Pixtral Large is available under the Mistral Research License for research and educational use, and under the Mistral Commercial License for commercial applications.Starting Price: Free -
32
Aya
Cohere AI
Aya is a new state-of-the-art, open-source, massively multilingual, generative large language research model (LLM) covering 101 different languages — more than double the number of languages covered by existing open-source models. Aya helps researchers unlock the powerful potential of LLMs for dozens of languages and cultures largely ignored by most advanced models on the market today. We are open-sourcing both the Aya model, as well as the largest multilingual instruction fine-tuned dataset to-date with a size of 513 million covering 114 languages. This data collection includes rare annotations from native and fluent speakers all around the world, ensuring that AI technology can effectively serve a broad global audience that have had limited access to-date. -
33
DeepSeek-VL
DeepSeek
DeepSeek-VL is an open source Vision-Language (VL) model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios, including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive representation of practical contexts. Further, we create a use case taxonomy from real user scenarios and construct an instruction tuning dataset accordingly. The fine-tuning with this dataset substantially improves the model's user experience in practical applications. Considering efficiency and the demands of most real-world scenarios, DeepSeek-VL incorporates a hybrid vision encoder that efficiently processes high-resolution images (1024 x 1024), while maintaining a relatively low computational overhead.Starting Price: Free -
34
GeoSpy
GeoSpy
GeoSpy is an AI-powered platform that transforms pixels into actionable location intelligence by converting low-context photo data into precise GPS location predictions without relying on EXIF data. Trusted by over 1,000 organizations worldwide, GeoSpy offers global coverage, deploying its services in over 120 countries. The platform processes over 200,000 images daily and can scale to billions, providing fast, secure, and accurate geolocation services. GeoSpy Pro, designed for government and law enforcement agencies, integrates advanced AI location models to deliver meter-level accuracy through state-of-the-art computer vision models in an easy-to-use interface. Additionally, GeoSpy has introduced SuperBolt, a new AI model that enhances visual place recognition, offering improved accuracy in geolocation predictions. -
35
Arturo
Arturo
We are on a mission to empower people by providing clarity around the past, present and future of property. With coverage across the United States and Australia, we gather, synchronize and analyze imagery and other data surrounding properties. By using computer vision models that deliver intelligence at scale, we optimize how carriers operate and protect the assets that policyholders value most. With intelligent insurance, you don’t have to provide a lot of information about a house you are yet to be familiar with. Intelligent Insurance has been working with Arturo, and their roof condition model reveals that your new home shows evidence of staining and streaking, which is highly predictive of claim frequency and severity. -
36
SmolVLM
Hugging Face
SmolVLM-Instruct is a compact, AI-powered multimodal model that combines the capabilities of vision and language processing, designed to handle tasks like image captioning, visual question answering, and multimodal storytelling. It works with both text and image inputs, providing highly efficient results while being optimized for smaller, resource-constrained environments. Built with SmolLM2 as its text decoder and SigLIP as its image encoder, the model offers improved performance for tasks that require integration of both textual and visual information. SmolVLM-Instruct can be fine-tuned for specific applications, offering businesses and developers a versatile tool for creating intelligent, interactive systems that require multimodal inputs.Starting Price: Free -
37
Casafy AI
Casafy AI
Casafy AI is the world's first property search engine that analyzes visual data to instantly identify opportunities for buyers and sellers. It allows users to find properties that match their exact needs by analyzing visual data. Deploy AI agents to find your target properties in minutes, not months. Turn street-level data into actionable property intelligence. Transform weeks of manual property scouting into hours with our AI-powered search engine that identifies opportunities across entire metropolitan areas. Leverage our advanced computer vision to automatically detect property conditions, maintenance needs, and investment opportunities from street-level imagery. Convert visual data into business opportunities with precise property matching that helps you identify and prioritize high-potential leads. Our vision models analyze properties in real time, detecting specific criteria that match your requirements. -
38
Bild AI
Bild AI
Bild AI is an innovative platform that leverages artificial intelligence to streamline the traditionally manual and error-prone process of interpreting construction blueprints. By ingesting blueprint files, Bild AI applies advanced computer vision models and large language models to extract detailed material quantities and cost estimates for components such as flooring, doors, and hardware. This automation enables builders to generate accurate bids more efficiently, allowing them to bid on up to ten times more projects with increased confidence in the precision of their estimates. Beyond estimation, Bild AI assists in ensuring code compliance by identifying potential issues before blueprint submission, thereby facilitating smoother permitting processes. The platform also enhances blueprint accuracy by detecting inconsistencies and validating adherence to relevant standards and regulations. -
39
Magma
Microsoft
Magma is a cutting-edge multimodal foundation model developed by Microsoft, designed to understand and act in both digital and physical environments. The model excels at interpreting visual and textual inputs, allowing it to perform tasks such as interacting with user interfaces or manipulating real-world objects. Magma builds on the foundation models paradigm by leveraging diverse datasets to improve its ability to generalize to new tasks and environments. It represents a significant leap toward developing AI agents capable of handling a broad range of general-purpose tasks, bridging the gap between digital and physical actions. -
40
Reducto
Reducto
Reducto is a document-ingestion API that enables organizations to convert complex, unstructured documents, such as PDFs, images, and spreadsheets, into clean, structured outputs ready for large language model workflows and production pipelines. Its parsing engine reads documents as a human would, capturing layout, structure, tables, figures, and text regions with high accuracy; an “Agentic OCR” layer then reviews and corrects outputs in real time, enabling reliable results even in challenging edge cases. The platform enables automatic splitting of multi-document files or lengthy forms into individually useful units, using layout-aware heuristics to streamline pipelines without manual preprocessing. Once split, Reducto supports schema-level extraction of structured data, such as invoice fields, onboarding forms, or financial disclosures, so that the right information lands exactly where it is needed. The technology first applies layout-aware vision models to break down visual structure.Starting Price: $0.015 per credit -
41
Falcon 2
Technology Innovation Institute (TII)
Falcon 2 11B is an open-source, multilingual, and multimodal AI model, uniquely equipped with vision-to-language capabilities. It surpasses Meta’s Llama 3 8B and delivers performance on par with Google’s Gemma 7B, as independently confirmed by the Hugging Face Leaderboard. Looking ahead, the next phase of development will integrate a 'Mixture of Experts' approach to further enhance Falcon 2’s capabilities, pushing the boundaries of AI innovation.Starting Price: Free -
42
GPT-4V (Vision)
OpenAI
GPT-4 with vision (GPT-4V) enables users to instruct GPT-4 to analyze image inputs provided by the user, and is the latest capability we are making broadly available. Incorporating additional modalities (such as image inputs) into large language models (LLMs) is viewed by some as a key frontier in artificial intelligence research and development. Multimodal LLMs offer the possibility of expanding the impact of language-only systems with novel interfaces and capabilities, enabling them to solve new tasks and provide novel experiences for their users. In this system card, we analyze the safety properties of GPT-4V. Our work on safety for GPT-4V builds on the work done for GPT-4 and here we dive deeper into the evaluations, preparation, and mitigation work done specifically for image inputs. -
43
Qwen3.5
Alibaba
Qwen3.5 is a next-generation open-weight multimodal large language model designed to power native vision-language agents. The flagship release, Qwen3.5-397B-A17B, combines a hybrid linear attention architecture with sparse mixture-of-experts, activating only 17 billion parameters per forward pass out of 397 billion total to maximize efficiency. It delivers strong benchmark performance across reasoning, coding, multilingual understanding, visual reasoning, and agent-based tasks. The model expands language support from 119 to 201 languages and dialects while introducing a 1M-token context window in its hosted version, Qwen3.5-Plus. Built for multimodal tasks, it processes text, images, and video with advanced spatial reasoning and tool integration. Qwen3.5 also incorporates scalable reinforcement learning environments to improve general agent capabilities. Designed for developers and enterprises, it enables efficient, tool-augmented, multimodal AI workflows.Starting Price: Free -
44
Black.ai
Black.ai
Respond to events and make better decisions with the help of AI and your existing IP camera infrastructure. Cameras are almost exclusively used for security and surveillance purposes. We add cutting-edge Machine Vision models to unlock a high-impact resource available to your team daily. We help you to improve operations for your staff and customers without compromising privacy. No facial recognition, or long-term tracking, no exceptions. Fewer people in the loop. A reliance on staff compiling and watching footage is invasive and unscalable. We help you to review only the things that matter and only at the right time. Black.ai creates a privacy layer that sits between security cameras and operations teams, so you can build a better experience for people without breaching their trust. Black.ai interfaces with your existing cameras using parallel streaming protocols. Our system is installed without additional infrastructure cost or any risk of obstructing operations. -
45
Hero
Hero
Hero helps you identify, price, and list items for sale in seconds. List on Hero and other marketplaces in seconds. Auto-generate the listing title, description, condition, and photos. Our advanced vision models enable real-time item scanning and pricing by simply hovering your phone over them. Selling your stuff online should be easy & effortless. It can take hours to list an item, photos, descriptions, pricing, and going back and forth with buyers. Hero makes selling your stuff as easy as pie. Sign up for the waitlist to be among the first to sell stuff faster. -
46
Doppel
Doppel
Detect phishing scams on websites, social media, mobile app stores, gaming platforms, paid ads, the dark web, digital marketplaces, and more. Identify the highest impact phishing attacks, counterfeits, and more with next-gen natural language & computer vision models. Track enforcements with an auto-generated audit trail through our no-code UI that works out of the box. Stop adversaries before they scam your customers and team. Scan millions of websites, social media accounts, mobile apps, paid ads, etc. Use AI to categorize brand infringement and phishing scams. Automatically remove threats as they are detected. Doppel's system has integrations with domain registrars, social media, app stores, digital marketplaces, the dark web, and countless platforms across the Internet. This gives you comprehensive visibility and automated protection against external threats. Doppel offers automated protection against external threats. -
47
Rosepetal AI
Rosepetal AI
Rosepetal AI is an innovative technology company specializing in advanced artificial vision and deep-learning solutions designed specifically for industrial quality control. Our platform integrates dataset handling, automated labelling and training of adaptive neural networks, enabling real-time defect detection without requiring advanced technical expertise. This intuitive, no-code SaaS solution democratizes access to sophisticated AI, significantly enhancing efficiency, reducing waste, and driving operational excellence across multiple industries such as automotive, food processing, pharmaceuticals, plastics, and electronics. The unique strength of Rosepetal AI lies in its dynamic adaptability and scalability. Our system allows industrial companies to quickly deploy robust AI models directly onto their production lines, continuously adjusting to new product variations and emerging defects. This capability ensures consistent quality, minimizes downtime.Starting Price: €250 -
48
Strong Analytics
Strong Analytics
Our platforms provide a trusted foundation upon which to design, build, and deploy custom machine learning and artificial intelligence solutions. Build next-best-action applications that learn, adapt, and optimize using reinforcement-learning based algorithms. Custom, continuously-improving deep learning vision models to solve your unique challenges. Predict the future using state-of-the-art forecasts. Enable smarter decisions throughout your organization with cloud based tools to monitor and analyze. The process of taking a modern machine learning application from research and ad-hoc code to a robust, scalable platform remains a key challenge for experienced data science and engineering teams. Strong ML simplifies this process with a complete suite of tools to manage, deploy, and monitor your machine learning applications. -
49
SimplyIcon
SimplyIcon
Simply create windows icon files (.ICO format) by drag-and-drop images on to this program. Actually I wrote this program because I need a simple tool to generate icons, but I haven't found any decent free ones. This program generates down-sampled 32x32, 24x24 and 16x16 levels automatically. It will also generate the 128x128 level if your source image is equal to or larger than 128x128. -
50
PhotoMax
ExplorerMax
Smart photo organizing. Back up photos/videos, mark your favorites. Filter items with timeline, and categorize photos into albums. Simple drawing, custom cropping, quick rotating, instant preview and clear comparison pane. Preview images in HEIC. Convert HEIC to JPG/PNG. Register ExplorerMax to batch convert photos and remove the watermark. Windows Explorer (File Explorer) for Windows has been doing a decent job since the tree view is designed. ExplorerMax is a complete and smarter alternative that brings badly-needed features that Windows Explorer does not provide. It offers all of the standard features that you expect from default Windows Explorer. In default Windows explorer, if you opened too many folders at the same time, it would be very annoying to choose a specified folder because they all suspend near the task bar. With tabbed browsing feature, just as tabbed browser like Google Chrome, or Mozilla Firefox, you can access all the folders from one pane.Starting Price: $5.95 per month