Font Size: a A A

Research And System Realization Of Tibetan Continuous Speech Recognition Technology

Posted on:2016-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y M XuFull Text:PDF
GTID:2208330470466819Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The purpose of speech recognition is to convert speech signals into the corresponding text sequence or command information that the computer can understand and to achieve human-computer interaction. Speech recognition contains many core technologies, such as gaussian mixture model (GMM), hidden markov model (HMM), Mel-Frequency Cepstral Coefficients (MFCCs), n-gram language model, discriminative training and the adaptive training techniques. This paper mainly studied on the HMM-based Tibetan continuous speech recognition.Tibetan language belongs to the Tibetan branch, Tibetan-Burmese group, Chinese-Tibetan family. There are three dialect areas, U-Tsang, Kham and Amdo. Tibetan is a kind of multi-syllable alphabetic writing. Each syllable is composed of a number of phonemes, so Tibetan has complicated rules on phrase and pronunciation. According to Tibetan characteristics, we used monophone-based HMMs in this paper and then compared with the triphones models about recognition rates. Results showed that phoneme recognition rates increase from 68.71% gets from monophone-based HMMs to 69.39% gets from triphone-based HMMs, and raise syllable recognition rates from 23.44% to 42.07% respectively. The results show that the context-dependent approaches take into account the phenomenon of coarticulation, which can better describe speech. Additionally, taking into account the growing corpus of Tibetan speech, we introduced seed modeling approach used to train acoustic models for high precision acoustic model.Given the situation that it is the lack of a large amount of Tibetan corpus, we also studied the cross-language speech recognition method from English to Tibetan based on the sparse auto-encoder. Articulatory features (AFs) are viewed as the universal speech attributes for cross-language speech recognition. They are usually detected using a bank of multi-layer perceptrons (MLPs) in a supervised manner. In this paper, we propose to apply the sparse auto-encoder to detect AF-based speech attributes in a semi-supervised manner for cross-language speech recognition. The experimental results on Tibetan monophone recognition showed that the sparse auto-encoder can detect the AF-based speech attributes more accurately and has higher phone recognition rates than MLPs.At last, we pre-train context-dependent triphones models through an off-line method. Then we transplant the pronunciation dictionary prepared and triphones lists into Linux system. Finally we developed the continuous speech recognition system for Tibetan by the secondary development of HTK tools using QT.
Keywords/Search Tags:Speech recognition, Hidden Markov Model, Tibetan Articulatory features, acoustic model, cross-language
PDF Full Text Request
Related items