GLM-4-Voice | End-to-End Chinese-English Conversational Model
Capable of understanding text, audio, vision, video
The free, Open Source alternative to OpenAI, Claude and others
TEN, a voice agent framework to create conversational AI.
StreamSpeech is a seamless model for offline speech recognition
Apache OpenNLP
High-performance neural network inference framework for mobile
Repo of Qwen2-Audio chat & pretrained large audio language model
Voice Recognition to Text Tool
Speech recognition for your site
JavaScript OCR and text extraction for images and PDFs
A framework to enable multimodal models to operate a computer
AzioSpeech Recognition and Translation
Visual Causal Flow
Claude Code skill that removes signs of AI-generated writing from text
Qwen3-omni is a natively end-to-end, omni-modal LLM
Framework for building real-time voice and multimodal AI agents
A very simple framework for state-of-the-art NLP
Advanced NLP with spaCy: A free online course
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
NLP Cloud serves high performance pre-trained or custom models for NER
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
ITTT is a Free tool designed to Scan and extract Text from Images.
Transcribe on your own
Screenshots, word marking, OCR, AI, translation software