| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| CIEMPIESS_Spanish_Models_581h.zip | 2019-08-24 | 159.6 MB | |
| README.txt | 2019-08-23 | 4.0 kB | |
| LICENSE.txt | 2019-08-23 | 35.1 kB | |
| Totals: 3 Items | 159.6 MB | 1 | |
-------------------------------------------------------------------------------------------------
The CIEMPIESS Spanish Models
PocketSphinx Acoustic Models in Spanish made out of 581 hours of audio
by Dr. Carlos Daniel Hernández Mena
-------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------
PRESENTATION
-------------------------------------------------------------------------------------------------
The CIEMPIESS Spanish Models are acoustic models designed to work with PocketSphinx. The 581
hours of audio recordings used to train the models come from many datasets by LDC (including
all the CIEMPIESS corpus except the CIEMPIESS-TEST) and other sources collected by the social
service program "Desarrollo de Tecnologías del Habla" and the CIEMPIESS-UNAM project. Both of
them belonging to the "Univeridad Nacional Autónoma de México" (UNAM) in Mexico City.
-------------------------------------------------------------------------------------------------
MODEL CHARACTERISTICS
-------------------------------------------------------------------------------------------------
- Most the audio files used in the training stage contain clean speech. The training corpus
mixes read and spontaneous speech in many accents of Spanish including accents from Mexico,
Spain and Latin America.
- The acoustic models are Continuous and Context Dependent (CD). 10,000 senones were used for
its creation
- The audio format of the training files is Microsoft WAV 16Khz@16bit mono.
- The pronouncing dictionary contains more than 285,000 words.
- The phonetic alphabet used in the pronouncing dictionary is called Mexbet. For more
informatioin about Mexbet see www.ciempiess.org
- The phonetic transcriptions used in the pronouncing dictionary were made using a G2P-tool
called "fonetica3 library". For more information see www.ciempiess.org
- The text used for language model come from many sources including Wikipedia, trascribed
interviews and newspapers.
- The language model was created using SRILM.
-------------------------------------------------------------------------------------------------
TERMS OF USE
-------------------------------------------------------------------------------------------------
The CIEMPIESS Spanish Models by Carlos Daniel Hernández Mena are free software; you can
redistribute it and/or modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 3 of the License, or (at your option)
any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU General Public License for more details.
The CIEMPIESS Spanish Models were created by May, 2019.
-------------------------------------------------------------------------------------------------
ACKNOWLEDGEMENTS
-------------------------------------------------------------------------------------------------
The author would like to thank to Alejandro V. Mena, Elena Vera and Angélica Gutiérrez for their
support to the social service program: "Desarrollo de Tecnologías del Habla." They also thank
to the social service students for all the hard work.
-------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------
For more information and documentation see the CIEMPIESS-UNAM Project website at:
http://www.ciempiess.org/
-------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------