Showing 3062 open source projects for "speech to text in java"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB Atlas | Run databases anywhere Icon
    MongoDB Atlas | Run databases anywhere

    Ensure the availability of your data with coverage across AWS, Azure, and GCP on MongoDB Atlas—the multi-cloud database for every enterprise.

    MongoDB Atlas allows you to build and run modern applications across 125+ cloud regions, spanning AWS, Azure, and Google Cloud. Its multi-cloud clusters enable seamless data distribution and automated failover between cloud providers, ensuring high availability and flexibility without added complexity.
    Learn More
  • 1
    Vosk Speech Recognition Toolkit

    Vosk Speech Recognition Toolkit

    Offline speech recognition API for Android, iOS, Raspberry Pi

    ..., reconfigurable vocabulary and speaker identification. Speech recognition bindings are implemented for various programming languages like Python, Java, Node.JS, C#, C++, Rust, Go and others. Vosk supplies speech recognition for chatbots, smart home appliances, and virtual assistants. It can also create subtitles for movies, and transcription for lectures and interviews. Vosk scales from small devices like Raspberry Pi or Android smartphones to big clusters.
    Downloads: 79 This Week
    Last Update:
    See Project
  • 2
    commonmark-java

    commonmark-java

    Java library for parsing and rendering CommonMark (Markdown)

    Java library for parsing and rendering Markdown text according to the CommonMark specification (and some extensions). Provides classes for parsing input to an abstract syntax tree of nodes (AST), visiting and manipulating nodes, and rendering to HTML. It started out as a port of commonmark.js, but has since evolved into a full library with a nice API.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    sherpa-onnx

    sherpa-onnx

    Speech-to-text, text-to-speech, and speaker recognition

    Speech-to-text, text-to-speech, and speaker recognition using next-gen Kaldi with onnxruntime without an Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter.
    Downloads: 34 This Week
    Last Update:
    See Project
  • 4
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented...
    Downloads: 136 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    Piper TTS

    Piper TTS

    A fast, local neural text to speech system

    Piper is a fast, local neural text-to-speech (TTS) system developed by the Rhasspy team. Optimized for devices like the Raspberry Pi 4, Piper enables high-quality speech synthesis without relying on cloud services, making it ideal for privacy-conscious applications. It utilizes ONNX models trained with VITS to deliver natural-sounding voices across various languages and accents. Piper is particularly suited for offline voice assistants and embedded systems.
    Downloads: 49 This Week
    Last Update:
    See Project
  • 6
    TTS Voice Wizard

    TTS Voice Wizard

    Speech to Text to Speech, sends text as OSC messages

    Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) Use TTS Voice Wizard's accessibility features to improve your VRChat experience (it works outside of VRChat too!) You can convert your Speech-to-Text and back to Speech through various Speech Recognition and Text-to-Speech methods. You can send what you say as OSC messages to VRChat to be displayed on your avatar using KillFrenzyAvatarText or VRChats...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 7
    ChatGPT Desktop Application

    ChatGPT Desktop Application

    🔮 ChatGPT Desktop Application (Mac, Windows and Linux)

    ChatGPT Desktop Application (Mac, Windows and Linux)
    Downloads: 69 This Week
    Last Update:
    See Project
  • 8
    Koodo Reader

    Koodo Reader

    A modern ebook manager and reader with sync and backup

    Koodo Reader is an all-in-one ebook reader that can help you better manage and study your ebooks. It's free and open-source. Save your data to Dropbox or Webdav. Customize the source folder and synchronize among multiple devices using OneDrive, iCloud, Dropbox, etc. Single-column, two-column, or continuous scrolling layouts. Text-to-speech, translation, progress slider, touch screen support, batch import. Add bookmarks, notes, highlights to your books. Adjust font size, font family, line...
    Downloads: 54 This Week
    Last Update:
    See Project
  • 9
    Termux application

    Termux application

    Terminal emulator application for Android OS extendible

    Termux is an Android terminal application and Linux environment. At first start a small base system is downloaded, desired packages can then be installed using the apt package manager known from the Debian and Ubuntu Linux distributions. Access the built-in help by long-pressing anywhere on the terminal and selecting the Help menu option to learn more. Allows the app to view information about network connections such as which networks exist and are connected. Allows the app to create network...
    Downloads: 105 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 11
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using pip...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 12
    PDFsam

    PDFsam

    PDFsam, a desktop application to split, merge, mix, rotate PDF files

    PDFsam Basic is our free and open-source desktop application to split, merge, extract pages, rotate and mix PDF files. PDFsam Visual is a powerful tool to visually compose PDF files, reorder pages, delete pages, split, merge, rotate, encrypt, decrypt, extract text, convert to grayscale, crop PDF files. PDFsam Basic is written using JavaFX. Since version 4 it is released as a self-contained application and bundles a jlinked JDK while version 3 requires a Java Runtime Environment 8 with JavaFx...
    Downloads: 66 This Week
    Last Update:
    See Project
  • 13
    Apache NetBeans

    Apache NetBeans

    Apache NetBeans

    Apache NetBeans is much more than a text editor. It highlights source code syntactically and semantically, lets you easily refactor code, with a range of handy and powerful tools. Apache NetBeans provides editors, wizards, and templates to help you create applications in Java, PHP and many other languages. Apache NetBeans can be installed on all operating systems that support Java, i.e, Windows, Linux, Mac OSX and BSD. Write Once, Run Anywhere, applies to NetBeans too.
    Downloads: 57 This Week
    Last Update:
    See Project
  • 14
    AnkiDroid

    AnkiDroid

    Anki flashcards on Android

    Anki flashcards on Android. Your secret trick to achieve superhuman information retention. A semi-official port of the open source Anki spaced repetition flashcard system to Android. Memorize anything with AnkiDroid.
    Downloads: 31 This Week
    Last Update:
    See Project
  • 15
    OpenAI Translator

    OpenAI Translator

    Browser extension and cross-platform desktop app based on ChatGPT API

    .... You must press the shortcut key to trigger the translation after selecting a word. It offers three modes: translation, polishing and summarization. Our tool allows for mutual translation, polishing and summarization across 55 different languages. Streaming mode is supported! It allows users to customize their translation text. One-click copying, Text-to-Speech (TTS). Available on all platforms (Windows, macOS, and Linux) for both browsers and Desktop.
    Downloads: 35 This Week
    Last Update:
    See Project
  • 16
    RealtimeSTT

    RealtimeSTT

    A robust, efficient, low-latency speech-to-text library

    RealtimeSTT is a Python-based realtime speech-to-text engine emphasizing low latency, wake-word detection, voice activity detection, and automatic speech segmentation. It provides asynchronous callbacks, nanosecond-precision timestamps, and CLI tools, suitable for building voice assistants, meeting transcribers, or live caption systems.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    Nexa SDK

    Nexa SDK

    Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML

    Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), and speech-to-text (ASR), and text-to-speech (TTS) capabilities. Additionally, it offers an OpenAI-compatible API server with JSON schema mode for function calling and streaming support, and a user-friendly Streamlit UI. Users can run Nexa SDK in any device with Python environment, and GPU acceleration is supported, including CUDA, Metal, and ROCm...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 18
    Stanford CoreNLP

    Stanford CoreNLP

    Stanford CoreNLP, a Java suite of core NLP tools

    CoreNLP is your one stop shop for natural language processing in Java! CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. CoreNLP currently supports 6 languages, Arabic, Chinese, English, French, German, and Spanish. The centerpiece of CoreNLP is the pipeline. Pipelines take in raw text...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    Translate-Subtitle-File

    Translate-Subtitle-File

    Subtitle Creation Assistant

    ... providers, 2. You can configure your own API Key to use your own account's free quota, such as Tencent's free translation quota of 5 million characters per month, IBM's 500-minute speech-to-text free quota (tern. best The domain name has expired and I don't want to renew it.) Azure speech-to-text and DeepL free version have problems, it is normal to not use it, please wait for the next version to fix. Machine translation of subtitle files, use machine translation to process files.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 20
    Whishper

    Whishper

    Transcribe any audio to text, translate and edit subtitles 100% locall

    Open-source, local-first audio transcription and subtitling suite with a simple web UI. Thanks to open-source technologies, Whishper can run 100% offline. Your data never leaves your computer. Whishper allows you to translate your transcriptions to and from more than 60 languages thanks to Argos Translate and LibreTranslate. Download the transcriptions in many formats (json, txt, vtt, srt). Easily edit your subtitles right in the Web-UI.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 21
    Coqui TTS

    Coqui TTS

    A deep learning toolkit for Text-to-Speech, battle-tested in research

    TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects. High-performance Deep Learning models for Text2Speech tasks. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 22
    Kafdrop

    Kafdrop

    Kafka Web UI

    Kafdrop is a web UI for viewing Kafka topics and browsing consumer groups. The tool displays information such as brokers, topics, partitions, and consumers, and lets you view messages. This project is a reboot of Kafdrop 2.x, dragged kicking and screaming into the world of Java 17+, Kafka 2.x, Helm and Kubernetes. It's a lightweight application that runs on Spring Boot and is dead-easy to configure, supporting SASL and TLS-secured brokers.
    Downloads: 37 This Week
    Last Update:
    See Project
  • 23
    Parlant

    Parlant

    The behavior guidance framework for customer-facing LLM agents

    Parlant is a lightweight speech-to-text and text-to-speech framework designed for real-time AI-driven voice applications.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 24
    elevenlabs-api

    elevenlabs-api

    elevenlabs-api is an open source Java wrapper around the ElevenLabs

    Elevenlabs-api is an open-source Java wrapper around the ElevenLabs Voice Synthesis and Cloning Web API. Compiled JARs are available via the Releases tab. To access your ElevenLabs API key, head to the official website, you can view your xi-API-key using the 'Profile' tab on the website. To set up your ElevenLabs API key, you must register it with the ElevenLabsAPI Java API. For any public repository security, you should store your API key in an environment variable, or external from your...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 25
    Editor.js

    Editor.js

    A block-style editor with clean JSON output

    Editor.js is an open-source text editor offering a variety of features to help users create and format content efficiently. It has a modern, block-style interface that allows users to easily add and arrange different types of content, such as text, images, lists, quotes, etc. Each Block is provided via a separate plugin making Editor.js extremely flexible. Editor.js outputs clean JSON data instead of heavy HTML markup. Use it in the Web, iOS, Android, AMP, Instant Articles, speech readers...
    Downloads: 18 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.