GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. GLM-4-Voice builds upon the bilingual strengths of the GLM architecture, supporting both Chinese and English, and is designed to handle long-form conversations with context retention. The repository provides model weights, inference demos, and setup instructions for deploying speech-enabled AI systems.

Features

  • Real-time speech-to-text transcription with bilingual support
  • Natural text-to-speech generation for human-like voice output
  • Built on GLM-4 architecture with multimodal reasoning capabilities
  • Supports Chinese and English voice interaction
  • Provides inference demos and fine-tuning options
  • Quantized versions available for efficient deployment on limited hardware

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow GLM-4-Voice

GLM-4-Voice Web Site

Other Useful Business Software
Fully Managed MySQL, PostgreSQL, and SQL Server Icon
Fully Managed MySQL, PostgreSQL, and SQL Server

Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of GLM-4-Voice!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Large Language Models (LLM), Python AI Models

Registered

2025-10-04