Alternatives to Hume AI
Compare Hume AI alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to Hume AI in 2025. Compare features, ratings, user reviews, pricing, and more from Hume AI competitors and alternatives in order to make an informed decision for your business.
-
1
Google Cloud Speech-to-Text
Google
Google Cloud’s Speech API processes more than 1 billion voice minutes per month with close to human levels of understanding for many commonly spoken languages. Powered by the best of Google's AI research and technology, Google Cloud's Speech-to-Text API helps you accurately transcribe speech into text in 73 languages and 137 different local variants. Leverage Google’s most advanced deep learning neural network algorithms for automatic speech recognition (ASR) and deploy ASR wherever you need it, whether in the cloud with the API, on-premises with Speech-to-Text On-Prem, or locally on any device with Speech On-Device. -
2
Google AI Studio
Google
Google AI Studio is a comprehensive, web-based development environment that democratizes access to Google's cutting-edge AI models, notably the Gemini family, enabling a broad spectrum of users to explore and build innovative applications. This platform facilitates rapid prototyping by providing an intuitive interface for prompt engineering, allowing developers to meticulously craft and refine their interactions with AI. Beyond basic experimentation, AI Studio supports the seamless integration of AI capabilities into diverse projects, from simple chatbots to complex data analysis tools. Users can rigorously test different prompts, observe model behaviors, and iteratively refine their AI-driven solutions within a collaborative and user-friendly environment. This empowers developers to push the boundaries of AI application development, fostering creativity and accelerating the realization of AI-powered solutions. -
3
Speechmatics
Speechmatics
Best-in-Market Speech-to-Text & Voice AI for Enterprises. Speechmatics delivers industry-leading Speech-to-Text and Voice AI for enterprises needing unrivaled accuracy, security, and flexibility. Our enterprise-grade APIs provide real-time and batch transcription with exceptional precision—across the widest range of languages, dialects, and accents. Powered by Foundational Speech Technology, Speechmatics supports mission-critical voice applications in media, contact centers, finance, healthcare, and more. With on-prem, cloud, and hybrid deployment, businesses maintain full control over data security while unlocking voice insights. Trusted by global leaders, Speechmatics is the top choice for best-in-class transcription and voice intelligence. 🔹 Unmatched Accuracy – Superior transcription across languages & accents 🔹 Flexible Deployment – Cloud, on-prem, and hybrid 🔹 Enterprise-Grade Security – Full data control 🔹 Real-Time & Batch Processing – Scalable transcriptionStarting Price: $0 per month -
4
CallFinder
CallFinder
CallFinder speech analytics software automates outdated, manual QA processes to save time and provide immediate insights so you can make data-driven decisions. CallFinder automatically transcribes and scores recorded calls, identifying key metrics you can use to improve every aspect of your business. We deliver a highly scalable Software as a Service (SaaS) solution to contact centers and small and medium sized businesses across a wide range of industries. We like to think of ourselves as the speech analytics experts because, well, that’s all we do. We’re all about delivering a truly different software experience. You never get something that’s out-of-the-box. On purpose. Our Managed Client Services support is a differentiator that none of our speech analytics competitors offer. Your CallFinder Analyst becomes an integral part of your QA team, and you will work with your Analyst on a recurring basis to optimize CallFinder to meet your evolving business needs. -
5
Get insightful text analysis with machine learning that extracts, analyzes, and stores text. Train high-quality machine learning custom models without a single line of code with AutoML. Apply natural language understanding (NLU) to apps with Natural Language API. Use entity analysis to find and label fields within a document, including emails, chat, and social media, and then sentiment analysis to understand customer opinions to find actionable product and UX insights. Natural Language with speech-to-text API extracts insights from audio. Vision API adds optical character recognition (OCR) for scanned docs. Translation API understands sentiments in multiple languages. Use custom entity extraction to identify domain-specific entities within documents, many of which don’t appear in standard language models, without having to spend time or money on manual analysis. Train your own high-quality machine learning custom models to classify, extract, and detect sentiment.
-
6
Play.ht
Play.ht
AI Powered Text to Voice Generation. Play.ht offers uncanny, high-fidelity AI Voices for any project where you need human-sounding voice overs and performances. Hollywood studios, auto manufacturers, and other large enterprises use Play.ht to create realistic and engaging voiceovers quickly, without the hassle of scheduling and hiring voice talent. Our voices sound natural, expressive, and engaging, just like human voice talent. Play.ht offers API access as well as an online rich-text editor that allows you to generate entire performances with multiple speakers, edit their pacing, and generate unique versions of each paragraph - all within seconds. Join other companies looking to scale up and simplify their voice work by scheduling a live demo today.Starting Price: $199 per month -
7
Amazon Rekognition
Amazon
Amazon Rekognition makes it easy to add image and video analysis to your applications using proven, highly scalable, deep learning technology that requires no machine learning expertise to use. With Amazon Rekognition, you can identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content. Amazon Rekognition also provides highly accurate facial analysis and facial search capabilities that you can use to detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases. With Amazon Rekognition Custom Labels, you can identify the objects and scenes in images that are specific to your business needs. For example, you can build a model to classify specific machine parts on your assembly line or to detect unhealthy plants. Amazon Rekognition Custom Labels takes care of the heavy lifting of model development for you, so no machine learning experience is required. -
8
Amazon Polly
Amazon
Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Polly's Text-to-Speech (TTS) service uses advanced deep learning technologies to synthesize natural sounding human speech. With dozens of lifelike voices across a broad set of languages, you can build speech-enabled applications that work in many different countries. In addition to Standard TTS voices, Amazon Polly offers Neural Text-to-Speech (NTTS) voices that deliver advanced improvements in speech quality through a new machine learning approach. Polly’s Neural TTS technology also supports two speaking styles that allow you to better match the delivery style of the speaker to the application: a Newscaster reading style that is tailored to news narration use cases, and a Conversational speaking style that is ideal for two-way communication like telephony applications. -
9
Amazon Lex
Amazon
Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions. With Amazon Lex, the same deep learning technologies that power Amazon Alexa are now available to any developer, enabling you to quickly and easily build sophisticated, natural language, conversational bots (“chatbots”). With Amazon Lex, you can build bots to increase contact center productivity, automate simple tasks, and drive operational efficiencies across the enterprise. As a fully managed service, Amazon Lex scales automatically, so you don’t need to worry about managing infrastructure. -
10
Dialogflow
Google
Dialogflow from Google Cloud is a natural language understanding platform that makes it easy to design and integrate a conversational user interface into your mobile app, web application, device, bot, interactive voice response system, and so on. Using Dialogflow, you can provide new and engaging ways for users to interact with your product. Dialogflow can analyze multiple types of input from your customers, including text or audio inputs (like from a phone or voice recording). It can also respond to your customers in a couple of ways, either through text or with synthetic speech. Dialogflow CX and ES provide virtual agent services for chatbots and contact centers. If you have a contact center that employs human agents, you can use Agent Assist to help your human agents. Agent Assist provides real-time suggestions for human agents while they are in conversations with end-user customers. -
11
Komprehend
Komprehend
Komprehend AI APIs are the most comprehensive set of document classification and NLP APIs for software developers. Our NLP models are trained on more than a billion documents and provide state-of-the-art accuracy on most common NLP use cases such as sentiment analysis and emotion detection. Try our free demo now and see the effectiveness of our Text Analysis API. Maintains high accuracy in the real world, and brings out useful insights from open-ended textual data. Works on a variety of data, ranging from finance to healthcare. Supports private cloud deployments via Docker containers or on-premise deployment ensuring no data leakage. Protects your data and follows the GDPR compliance guidelines to the last word. Understand the social sentiment of your brand, product, or service while monitoring online conversations. Sentiment analysis is contextual mining of text which identifies and extracts subjective information in the source material.Starting Price: $79 per month -
12
Dandelion API
SpazioDati
Find mentions of places, people, brands and events in documents and social media. Easily get additional data about the entities. Classify multilingual text into standard, pre-defined taxonomies or build your own custom classification scheme in minutes. Identify whether the expressed opinion in short texts (like product reviews) is positive, negative, or neutral. Automatically identify important, contextually relevant, concepts and key-phrases in articles and social media posts. Compare two texts and compute their syntactic and semantic similarity. Understand when two texts are about the same subject. Extract clean text article from newspapers, blogs and other websites. Remove boilerplate and advertising and get the article full text and images.Starting Price: $49 per month -
13
Element Human
Element Human
Replace clunky ad testing with real world engagement. Attention and Emotions at the speed and scale of a click. We provide the science, the tools, and the platform to quickly set up, measure and respond to human behaviours at scale, cost-effectively. We believe that the more we understand the subconscious and conscious drivers of human behaviour, the better our predictions, decisions, and interactions will be. We are a group of science, technology and design experts obsessed with enabling everyday devices to observe and measure how people live their lives. Our consent-based platform enables everyday devices to safely capture and respond to the emotional, memory and thought drivers of human behaviour as people interact with digital experiences. Through 7 years and 2.5 billion data points collected across 89 countries and 40 businesses, we developed a proprietary solution that monitors and understands how our digital experiences shape human behaviours.Starting Price: $2,014.10 per user -
14
PolygrAI
PolygrAI
PolygrAI is an innovative platform that provides real-time insights into emotional states and potential deception. Performing polygraph examination has never been easier with our desktop application, just click start, choose your video feed source, and see the insights. Our interface allows you to see through words and gain insights into the subconscious. The most important and comprehensive metric, simplified for your convenience. Helping you understand the overall sentiment throughout the entire examination. Categorized with priority, having primary, secondary, and tertiary emotions detected. When choosing a subject person, all others shown in the video feed will be ignored. Our desktop application is packed with other features designed to help you perform better and easier assessments. You can choose the default screen capturing which allows you to use with any other application, or connect a USB camera.Starting Price: $28/month -
15
Azure Face API
Microsoft
Embed facial recognition into your apps for a seamless and highly secured user experience. No machine learning expertise is required. Features include: face detection that perceives faces and attributes in an image; person identification that matches an individual in your private repository of up to 1 million people; perceived emotion recognition that detects a range of facial expressions like happiness, contempt, neutrality, and fear; and recognition and grouping of similar faces in images. Recognize faces according to diverse attributes. Add facial recognition to your apps, all through a single API call. Run Face in the cloud or on the edge in containers. Rely on enterprise-grade security and privacy applied to both your data and any trained models. Detect, identify, and analyze faces in images and videos. Build on top of this technology to support various scenarios. Detect one or more human faces along with attributes.Starting Price: $0.01 per month -
16
Octave TTS
Hume AI
Hume AI has introduced Octave (Omni-capable Text and Voice Engine), a groundbreaking text-to-speech system that leverages large language model technology to understand and interpret the context of words, enabling it to generate speech with appropriate emotions, rhythm, and cadence, unlike traditional TTS models that merely read text, Octave acts akin to a human actor, delivering lines with nuanced expression based on the content. Users can create diverse AI voices by providing descriptive prompts, such as "a sarcastic medieval peasant," allowing for tailored voice generation that aligns with specific character traits or scenarios. Additionally, Octave offers the flexibility to modify the emotional delivery and speaking style through natural language instructions, enabling commands like "sound more enthusiastic" or "whisper fearfully" to fine-tune the output.Starting Price: $3 per month -
17
MARS6
CAMB.AI
CAMB.AI's MARS6 is a groundbreaking text-to-speech (TTS) model that has become the first speech model accessible on Amazon Web Services (AWS) Bedrock platform. This integration allows developers to incorporate advanced TTS capabilities into generative AI applications, facilitating the creation of enhanced voice assistants, engaging audiobooks, interactive media, and various audio-centric experiences. MARS6's advanced algorithms enable natural and expressive speech synthesis, setting a new standard for TTS conversion. Developers can access MARS6 directly through the Amazon Bedrock platform, ensuring seamless integration into applications and enhancing user engagement and accessibility. The inclusion of MARS6 in AWS Bedrock's diverse selection of foundation models underscores CAMB.AI's commitment to advancing machine learning and artificial intelligence, providing developers with vital tools to create rich audio experiences supported by AWS's reliable and scalable infrastructure. -
18
GPT-Image-1
OpenAI
OpenAI's Image Generation API, powered by the gpt-image-1 model, enables developers and businesses to integrate high-quality, professional-grade image generation directly into their tools and platforms. This model offers versatility, allowing it to create images across diverse styles, faithfully follow custom guidelines, leverage world knowledge, and accurately render text, unlocking countless practical applications across multiple domains. Leading enterprises and startups across industries, including creative tools, ecommerce, education, enterprise software, and gaming, are already using image generation in their products and experiences. It gives creators the choice and flexibility to experiment with different aesthetic styles. Users can generate and edit images from simple prompts, adjusting styles, adding or removing objects, expanding backgrounds, and more.Starting Price: $0.19 per image -
19
FaceReader
Noldus
To gain accurate and reliable data about facial expressions, FaceReader is the most robust automated system that will help you out. Clear insights into the effect of different stimuli on emotions. Very easy-to-use, save valuable time and resources. Easy integration with eye tracking data and physiology data. Many researchers have turned towards using automated facial expression analysis software to better provide an objective assessment of emotions. FaceReader software is fast, flexible, objective, accurate, and easy to use. It immediately analyzes your data (live, video, or still images), saving valuable time. The option to record audio as well as video makes it possible to hear what people have been saying, for example, during human-computer interactions, or while watching stimuli. FaceReader is the most robust automated system for the recognition of a number of specific properties in facial images, including the six basic or universal expressions. -
20
Receptiviti
Receptiviti
Use language to reveal personality traits and drives. Receptiviti maps personalities to the Big Five personality framework. It includes a total of 35 different measures of personality. Understand how people think and behave in social settings by measuring their authenticity, clout, self-focus, affiliation, and more. Understand what is driving a person's behaviour, whether they are driven by the need for achievement and self actualization, domination, reward, avoidance of risk or by engaging in risk-seeking behaviour. Detect abusive or threatening language that expresses prejudice, violence against a particular group on the basis of race, religion, or sexual orientation and more. Determine the author of your text of interest. This tool is especially useful for literary research, cybersecurity, forensics, and social media analysis. -
21
Amazon Nova Sonic
Amazon
Amazon Nova Sonic is a state-of-the-art speech-to-speech model that delivers real-time, human-like voice conversations with industry-leading price performance. It unifies speech understanding and generation into a single model, enabling developers to create natural, expressive conversational AI experiences with low latency. Nova Sonic adapts its responses based on the prosody of input speech, such as pace and timbre, resulting in more natural dialogue. It supports function calling and agentic workflows to interact with external services and APIs, including knowledge grounding with enterprise data using Retrieval-Augmented Generation (RAG). It provides robust speech understanding for American and British English across various speaking styles and acoustic conditions, with additional languages coming soon. Nova Sonic handles user interruptions gracefully without dropping conversational context and is robust to background noise. -
22
Charactr
Charactr
Powered by our state-of-the-art WaveThruVec model, transform the text into expressive AI-generated speech with TTS or convert existing or new voice recordings into an AI-generated voice with Voice to Voice conversion. From from photo-realistic to pixel art - and everything in between, generate incredible animated and talking virtual characters that can easily be integrated into your app, game, website, or media project with our upcoming Visual and Motion API. Our API includes a state-of-the-art selection of male, female, and unique synthetic character voices that can be used to add natural and expressive speech into your app, game, or project. -
23
D-ID
D-ID
D-ID is a cutting-edge technology company specializing in generative AI and synthetic media, best known for its innovative Creative Reality Studio. This platform allows users to transform text, images, and audio into photorealistic videos featuring lifelike digital humans with natural facial expressions, speech, and movements. By combining deep learning, computer vision, and advanced AI models, D-ID empowers businesses, educators, and content creators to produce personalized, interactive video content at scale. The Creative Reality Studio enables users to generate talking avatars from static images, making it a popular tool for e-learning, marketing, entertainment, and customer service. Committed to privacy and ethical AI use, D-ID also incorporates facial anonymization technology, ensuring secure and responsible handling of visual data.Starting Price: $5.90 per month -
24
Gemini
Google
Gemini is Google's advanced AI chatbot designed to enhance creativity and productivity by engaging in natural language conversations. Accessible via the web and mobile apps, Gemini integrates seamlessly with various Google services, including Docs, Drive, and Gmail, enabling users to draft content, summarize information, and manage tasks efficiently. Its multimodal capabilities allow it to process and generate diverse data types, such as text, images, and audio, providing comprehensive assistance across different contexts. As a continuously learning model, Gemini adapts to user interactions, offering personalized and context-aware responses to meet a wide range of user needs.Starting Price: Free -
25
ChatGPT Pro
OpenAI
As AI becomes more advanced, it will solve increasingly complex and critical problems. It also takes significantly more compute to power these capabilities. ChatGPT Pro is a $200 monthly plan that enables scaled access to the best of OpenAI’s models and tools. This plan includes unlimited access to our smartest model, OpenAI o1, as well as to o1-mini, GPT-4o, and Advanced Voice. It also includes o1 pro mode, a version of o1 that uses more compute to think harder and provide even better answers to the hardest problems. In the future, we expect to add more powerful, compute-intensive productivity features to this plan. ChatGPT Pro provides access to a version of our most intelligent model that thinks longer for the most reliable responses. In evaluations from external expert testers, o1 pro mode produces more reliably accurate and comprehensive responses, especially in areas like data science, programming, and case law analysis.Starting Price: $200/month -
26
Cohere
Cohere AI
Cohere is an enterprise AI platform that enables developers and businesses to build powerful language-based applications. Specializing in large language models (LLMs), Cohere provides solutions for text generation, summarization, and semantic search. Their model offerings include the Command family for high-performance language tasks and Aya Expanse for multilingual applications across 23 languages. Focused on security and customization, Cohere allows flexible deployment across major cloud providers, private cloud environments, or on-premises setups to meet diverse enterprise needs. The company collaborates with industry leaders like Oracle and Salesforce to integrate generative AI into business applications, improving automation and customer engagement. Additionally, Cohere For AI, their research lab, advances machine learning through open-source projects and a global research community.Starting Price: Free -
27
SoundHound
SoundHound AI
We believe every brand should have a voice and every person should be able to interact naturally with the products around them, by simply talking. At SoundHound Inc., we’re working together with our strategic partners to build a more accessible and connected world. We build custom voice assistants for companies wanting to keep their brand, users, and data. Built on the foundation of proprietary Speech-to-Meaning® and Deep Meaning Understanding® technologies, the Houndify platform provides conversational intelligence unmatched by others in the industry. Houndify everything! Voice-enable the world with conversational intelligence. Create a voice AI platform that exceeds human capabilities and brings value and delight via an ecosystem of billions of products enhanced by innovation and monetization opportunities. Headquartered in the heart of Silicon Valley, we are a global company with 9 offices in key markets and teams in 16 countries. -
28
Affect Lab
Affect Lab
Tech-driven consumer insights platform for Insights teams. Map insights across media, digital and shopper touchpoints, deliver customer experiences that resonate emotionally, optimize customer journey for increased conversions, gain emotion, attention, engagement and noticeability insights. Usability testing and analytics platform for UX teams. Measure attention, engagement and emotion across user journeys, test prototypes, mockups, websites, apps and chatbots, identify key elements within the UI that customers notice, deliver emotionally optimized UX and drive conversions. Emotion Insights to create the best customer experiences. Facial Coding APIs to measure emotional response at scale, single face emotion recognition, in-the-wild multi face emotion recognition, recorded video emotion analysis. Test stimuli of various modes and channels like videos, print ads, planograms, package designs, websites, apps, chatbots, etc. -
29
ElevenLabs
ElevenLabs
The most realistic and versatile AI speech software, ever. Eleven brings the most compelling, rich and lifelike voices to creators and publishers seeking the ultimate tools for storytelling. Generate top-quality spoken audio in any voice and style with the most advanced and multipurpose AI speech tool out there. Our deep learning model renders human intonation and inflections with unprecedented fidelity and adjusts delivery based on context. Our AI model is built to grasp the logic and emotions behind words. And rather than generate sentences one-by-one, it’s always mindful of how each utterance ties to preceding and succeeding text. This zoomed-out perspective allows it to intonate longer fragments convincingly and with purpose. And finally you can do this with any voice you want.Starting Price: $1 per month -
30
MorphCast
Cynny
MorphCast Emotion AI Interactive Video Platform is the most flexible, easy to use and fast solution to let creatives design highly engaging interactive videos in minutes. In addition to the most updated interaction options, the video content can be triggered by the viewer’s facial expressions while watching it, thanks to our Facial Emotion AI integrated in the platform. MorphCast is a dynamic tool created for professionals. You can download it for free from Microsoft and Mac App Store. You will only pay for the minutes of views of your videos, and the first 2.000 minutes per month are always free. MorphCast also offers you an analytics dashboard to evaluate the performance of your interactive videos. You can measure how your contents perform and adjust your audience experience according to their interaction and emotional reaction. -
31
Deepgram
Deepgram
Deploy accurate speech recognition at scale while continuously improving model performance by labeling data and training from a single console. We deliver state-of-the-art speech recognition and understanding at scale. We do it by providing cutting-edge model training and data-labeling alongside flexible deployment options. Our platform recognizes multiple languages, accents, and words, dynamically tuning to the needs of your business with every training session. The fastest, most accurate, most reliable, most scalable speech transcription, with understanding — rebuilt just for enterprise. We’ve reinvented ASR with 100% deep learning that allows companies to continuously improve accuracy. Stop waiting for the big tech players to improve their software and forcing your developers to manually boost accuracy with keywords in every API call. Start training your speech model and reaping the benefits in weeks, not months or years.Starting Price: $0 -
32
The IBM Watson® Tone Analyzer uses linguistic analysis to detect emotional and language tones in written text. Watson Tone Analyzer can analyze tone at both the document and sentence levels. You can use the service to understand how your written communications are perceived and then to improve the tone of your communications. Businesses can use the service to learn the tone of their customers' communications and to respond to each customer appropriately, or to understand and improve their customer conversations. In this tutorial, you will learn how to use IBM Cloud Functions and cognitive and data services to build a serverless back end for a mobile application. Analyze emotions and tones in what people write online, like tweets or reviews. Predict whether they are happy, sad, confident, and more. Enable your chatbot to detect customer tones so you can build dialog strategies to adjust the conversation accordingly.
-
33
Vokaturi
Vokaturi
The Vokaturi software reflects the state of the art in emotion recognition from the human voice. Its algorithms have been designed, and are continually improved, by Paul Boersma, professor of Phonetic Sciences at the University of Amsterdam, who is the main author of the world’s leading speech analysis software Praat. Vokaturi can measure directly from your voice whether you are happy, sad, afraid, angry, or have a neutral state of mind. Currently the open-source version of the software chooses between these five emotions with high accuracy, even if it hears the speaker for the first time. The "plus" version of the software reaches the performance level of a dedicated human listener. As a developer you can easily include the Vokaturi software as a library in your own applications. You can choose between a free open-source license and a paid license. -
34
Behavioral Signals
Behavioral Signals
We are at the forefront of human communication in a groundbreaking era. Driven by cutting-edge AI technology, we go beyond words, diving deep into the intricacies of human expression. Understanding emotions, assessing behaviors, and predicting intent, we unlock the essence of every interaction. Our transformative impact spans various industries, from strengthening security and defense operations to redefining contact centers and empowering financial institutions with invaluable insights. With our innovative approach, we reshape the way connections are made and understood, ushering in a new era of communication. Our core technology is provided via our Behavioral Signals API, which is responsible to predict low-level and behavioral voice characteristics from audio signals. Applications: - Customer Service - Security, Intelligence, and Law Enforcement - Cognitive Health & Mental Health - Digital Companions/Chatbots - Healthcare - Entertainment -
35
EyeRecognize
EyeRecognize
Our image and video recognition APIs are proven, highly scalable, and leverage deep learning technology that you can implement within your own applications without prior knowledge of machine learning expertise. EyeRecognize’s suite of image and video recognition API services allow you to identify objects, people, text, scenes, and activities in images and videos, as well as detect any faces and NSFW content. Face Detection and Analysis, detect all face in images and video and get attributes such as face location, gender, age, eyes, and even emotion. Text Detection, extract text from images such as license plates, street signs, advertising, and brand names. Identify NSFW "Not Safe for Work" and other potentially inappropriate content across both image and video. The team behind EyeRecognize has been collectively developing artificial intelligence powered applications for over 40 years and first pioneered the use of machine learning to automate content moderation for social media. -
36
alwaysAI
alwaysAI
alwaysAI provides developers with a simple and flexible way to build, train, and deploy computer vision applications to a wide variety of IoT devices. Select from a catalog of deep learning models or upload your own. Use our flexible and customizable APIs to quickly enable core computer vision services. Quickly prototype, test and iterate with a variety of camera-enabled ARM-32, ARM-64 and x86 devices. Identify objects in an image by name or classification. Identify and count objects appearing in a real-time video feed. Follow the same object across a series of frames. Find faces or full bodies in a scene to count or track. Locate and define borders around separate objects. Separate key objects in an image from background visuals. Determine human body poses, fall detection, emotions. Use our model training toolkit to train an object detection model to identify virtually any object. Create a model tailored to your specific use-case. -
37
iMotions
iMotions
The world’s leading human behavior research tool. The iMotions software is your research solution for any kind of lab research. Whether you are working within behavioral science, doing observations, studying human factors, conducting usability testing or working in a simulation environment, iMotions is your answer. Complete stimuli presentation (images, videos, websites, games, mobile/apps and VR). Integrates and synchronizes all types of sensors (eye tracking, Facial Expression Analysis, electrodermal activity aka GSR, EEG, ECG, EMG). Includes API to import/export transfer data from other sources. Built-in survey tool to add questionnaires to the data set. Live and post markers for annotations and behavioral coding. Complete study editing and analysis using embedded R-scripting to visualize data. Scene and respondent recordings and replay. Build study design by point and click.Starting Price: $2,900 per year -
38
Orpheus TTS
Canopy Labs
Canopy Labs has introduced Orpheus, a family of state-of-the-art speech large language models (LLMs) designed for human-level speech generation. These models are built on the Llama-3 architecture and are trained on over 100,000 hours of English speech data, enabling them to produce natural intonation, emotion, and rhythm that surpasses current state-of-the-art closed source models. Orpheus supports zero-shot voice cloning, allowing users to replicate voices without prior fine-tuning, and offers guided emotion and intonation control through simple tags. The models achieve low latency, with approximately 200ms streaming latency for real-time applications, reducible to around 100ms with input streaming. Canopy Labs has released both pre-trained and fine-tuned 3B-parameter models under the permissive Apache 2.0 license, with plans to release smaller models of 1B, 400M, and 150M parameters for use on resource-constrained devices. -
39
IBM Watson
IBM
Learn how to operationalize AI in your business. Watson helps you predict and shape future outcomes, automate complex processes, and optimize your employees’ time. Infuse Watson into your workflows to predict and shape future outcomes, automate complex processes, and optimize your employees’ time. Infuse Watson into your apps and workflows to tap into organizational data and put AI to work across multiple departments – from finance, to customer care, to supply chain. With Watson, you can create better, more personalized experiences for customers, scale the expertise of your best people across the organization, and make smarter decisions based on deep insights from data. Watson products and solutions are grounded in science, human-centered design, and inclusivity. An open, faster, more secure way to move more workloads to cloud and AI. -
40
MeaningCloud
MeaningCloud
MeaningCloud is the easiest, most powerful, and most affordable way to extract the meaning from unstructured content: documents, articles, social conversations, web content, etc. We provide text analytics products to extract the most accurate insights from any content in many languages. And we do it SaaS and On-prem. We work for different industries (pharma, finance, media, retail, hospitality, telco, etc.) developing personalized and industry-oriented solutions. Pay only for what you use, without any activation fees, minimum time commitment and with the most generous free plan of the market. If you don't like it, you can stop using it, just like that. Without software to install or infrastructure to deploy. All the reliability and scalability of solutions in the cloud, and the possibility of testing it for free.Starting Price: $99 per month -
41
Allganize
Allganize
Allganize's industry-leading AI solutions provide businesses with the best tool to automate customer and employee support. Automate an average of 72% of all monthly support tickets within the first 4 months of implementation. Let our AI automate simple customer requests and free up your agents’ time to handle more complex issues. Employees can ask questions in a conversational way and find answers from multiple document types. Conversational AI chat bot pre-trained for your websites and automates customer service. Intelligent search that extracts accurate answers from any document, instantaneously. Automatically extracts important keywords from any document and categorizes them, providing valuable insights. Understands the context of product reviews using one's natural language to automatically detect positive or negative experiences. Assigns predefined categories from customer support conversions to accurately determine user intent.Starting Price: $2 per month -
42
Chirp 3
Google
Google Cloud's Text-to-Speech API introduces Chirp 3, enabling users to create personalized voice models using their own high-quality audio recordings. This feature facilitates the rapid generation of custom voices, which can be utilized to synthesize audio through the Cloud Text-to-Speech API, supporting both streaming and long-form text. Access to this voice cloning capability is restricted to allow-listed users due to safety considerations; interested parties should contact the sales team to be added to the allowed list. Instant Custom Voice creation and synthesis are supported in various languages, including English (US), Spanish (US), and French (Canada), among others. It is available in multiple Google Cloud regions, and supported output formats include LINEAR16, OGG_OPUS, PCM, ALAW, MULAW, and MP3, depending on the API method used. -
43
Zyphra Zonos
Zyphra
Zyphra is excited to announce the release of Zonos-v0.1 beta, featuring two expressive and real-time text-to-speech models with high-fidelity voice cloning. We are releasing our 1.6B transformer and 1.6B hybrid under an Apache 2.0 license. It is difficult to quantitatively measure quality in the audio domain; we find that Zonos’ generation quality matches or exceeds that of leading proprietary TTS model providers. Further, we believe that openly releasing models of this caliber will significantly advance TTS research. Zonos model weights are available on Huggingface, and sample inference code for the models is available on our GitHub. You can also access Zonos through our model playground and API with simple and competitive flat-rate pricing. We have found that quantitative evaluations struggle to measure the quality of outputs in the audio domain, so for demonstration, we present a number of samples of Zonos vs both proprietary models.Starting Price: $0.02 per minute -
44
gpt-realtime
OpenAI
GPT-Realtime is OpenAI’s most advanced, production-ready speech-to-speech model, now accessible through the fully available Realtime API. It delivers remarkably natural, expressive audio with fine-grained control over tone, pace, and accent. The model can comprehend nuanced human audio, including laughter, switch languages mid-sentence, and accurately process alphanumeric details like phone numbers across multiple languages. It significantly improves reasoning and instruction-following (achieving 82.8% on the BigBench Audio benchmark and 30.5% on MultiChallenge) and boasts enhanced function calling, now more reliable, timely, and accurate (scoring 66.5% on ComplexFuncBench). The model supports asynchronous tool invocation so conversations remain fluid even during long-running calls. The Realtime API also offers innovative capabilities such as image input support, SIP phone network integration, remote MCP server connection, and reusable conversation prompts.Starting Price: $20 per month -
45
BlueML
Explorance
Get an in-depth analysis of your open text comments in seconds with Blue Machine Learning (BlueML) solutions. Now you can see what matters most to your students and employees and instantly get more actionable insights to streamline your decisions. Most comment analysis tools use a generic one-size-fits-all approach usually based on customer experience machine learning models. However, when you look at the employee or student journey, they’re made up of specific components around experience and learning. With BlueML, you can leverage three specialized models that will accurately consume and analyze comments from each area along the student and employee journeys, giving you context-specific categorization. Get an accurate view of the overall sentiments in employee and student comments (very negative, negative, neutral, positive, very positive, ambiguous). Gain insights about what emotions employees and students have expressed in their comments. -
46
Google Cloud Text-to-Speech
Google
Convert text into natural-sounding speech using an API powered by Google’s AI technologies. Deploy Google’s groundbreaking technologies to generate speech with humanlike intonation. Built based on DeepMind’s speech synthesis expertise, the API delivers voices that are near human quality. Choose from a set of 220+ voices across 40+ languages and variants, including Mandarin, Hindi, Spanish, Arabic, Russian, and more. Pick the voice that works best for your user and application. Create a unique voice to represent your brand across all your customer touchpoints, instead of using a common voice shared with other organizations. Train a custom voice model using your own audio recordings to create a unique and more natural sounding voice for your organization. You can define and choose the voice profile that suits your organization and quickly adjust to changes in voice needs without needing to record new phrases. -
47
Murf AI
Murf AI
Murf API is an advanced text-to-speech (TTS) solution that transforms written text into natural, lifelike voiceovers with remarkable accuracy and ease. It empowers developers and businesses with a suite of sophisticated features, including pitch and speed modulation, audio duration adjustments, customizable pauses, and an extensive pronunciation library. With 133+ AI voices in 20+ languages, including regional accents, Murf API enables businesses to create localized and accessible audio experiences for global audiences. The API supports a variety of audio formats—MP3, WAV, FLAC, ALAW, ULAW, and Base64. Murf API features a transparent, self-serve pricing model with flexible plans, robust security measures, and comprehensive documentation, ensuring effortless integration with chatbots, IVR systems, websites, and mobile apps.Starting Price: $9/one-time -
48
Novita AI
novita.ai
Explore the full spectrum of AI APIs tailored for image, video, audio, and LLM applications. Novita AI is designed to elevate your AI-driven business at the pace of technology, offering model hosting and training solutions. Access 100+ APIs, including AI image generation & editing with 10,000+ models, and training APIs for custom models. Enjoy the cheapest pay-as-you-go pricing, freeing you from GPU maintenance hassles while building your own products. generate images in 2s from 10000+ models with a single click. Updated models with civitai and hugging face. Provide a wide variety of products based on Novita API. You can empower your own products with a quick Novita API integration.Starting Price: $0.0015 per image -
49
OpenAI Realtime API
OpenAI
The OpenAI Realtime API is a newly introduced API, announced in 2024, that allows developers to create applications that facilitate real-time, low-latency interactions, such as speech-to-speech conversations. This API is designed for use cases like customer support agents, AI voice assistants, and language learning apps. Unlike previous implementations that required multiple models for speech recognition and text-to-speech conversion, the Realtime API handles these processes seamlessly in one call, enabling applications to handle voice interactions much faster and with more natural flow. -
50
NeuralSpace
NeuralSpace
Leverage NeuralSpace enterprise-grade APIs to unlock the full potential of speech & text AI for 100+ languages. Reduce time spent on manual tasks by up to 50% with Intelligent Document Processing. Extract, understand, and categorise data from any document - regardless of quality, layout, or file type. Freeing your team from manual tasks to focus on what matters most. Make your products globally accessible with advanced speech and text AI. Train and deploy top-tier large language models on the NeuralSpace platform. Our user-friendly, low-code APIs ensure effortless integration. We provide the tools - you bring your vision to life.