Font Size: a A A

Research On Speech Keyword Spotting Technology For Mongolian

Posted on:2014-01-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:L FeiFull Text:PDF
GTID:1228330398996422Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of computer multimedia technology, the speech data for Mongolian has increased rapidly in many fields such as edu-cation, film, culture, etc. These data are valuable national culture resources. On the other side, however, the effective retrieval and classification of these speech data become a hot topic of Mongolian information processing. Speech keyword spotting is a technology that tries to find the most similar voice clips from des-ignated speech dataset giving the user queries. In this thesis, we make a further study on some technologies that are specific for applying speech keyword spot-ting technology on the Mongolian language, including Mongolian Large Voca-bulary Continuous Speech Recognition (LVCSR) technology, Mongolian Speech keywords spotting technology based on lattice and confusion network and grapheme to phoneme conversion technology for Mongolian. The technolo-gies discussed in our thesis not only can promote prosperity and development in minority areas, but also have great importance in maintaining the national secu-rity and stability of minority areas. The main contributions of our research are described as follows:1. Mongolian is an agglutinative language, It is possible to produce a very large number of words from the root with suffixes, so that the study of Mongolian Large Vocabulary Continuous Speech Recognition is very diffi-cult. To overcome this difficulty, in this thesis, we propose a Segmenta-tion-based LVCSR approach, which recognizes Mongolian words according to the characteristic of Mongolian word-formation rule. We detailed the basic principles of the Mongolian speech recognition technology, and rebuilt the corresponding Acoustic Model and Language Model for the Segmenta- tion-based LVCSR approach. Experimental results show that our Segmenta-tion-based method can effectively solves the recognition problem of a very large number of Mongolian words. What’s more, the pronunciation correc-tion of the ending suffixes before training the acoustic models can greatly improves the recognition accuracy. The idea proposed for Mongolian can be considered as a successful case, which can be referred to by Speech Recogni-tion and Detection research on other agglutinative languages.2. Our work firstly applies keyword spotting that is based on Lattice and Con-fusion Network to Mongolian keyword spotting task, and improves In-Vocabulary spotting method by considering the word-formation rule of Mongolian. Firstly, we describe the posterior probability estimation, key-words searching and the calculation of confidence measures in Mongolian speech keyword spotting method that is based on word lattice. Secondly, we introduce another Mongolian speech keyword spotting method that is based on word confusion network and, correspondingly, the indexing, keywords searching and confirming scheme it used. Finally, we propose an improved In-Vocabulary spotting method according to the word-formation rule of Mongolian. Experimental results show that the Mongolian speech keyword spotting method that based on word confusion network is better than that based on word lattice in all respects and that the improved In-Vocabulary spotting method effectively increases the system performance.3. To detect the large amount of Out-of-Vocabulary words, we propose a Mongolian keyword spotting method based on phoneme confusion network. If a speech file is decoded to phonemes form, it generally can not be recog-nized with high accuracy. What’s worse, a lot of phoneme that even does not obey the prosody will appear. To improve the system precision and recall, we propose a new confidence calculation algorithm which is based on phoneme confusion matrix and achieved satisfied results. We firstly introduce the in-dex building approach for phoneme confusion network; Secondly, we depict the phoneme confusion matrix; Thirdly, we demonstrate the phonemes searching and confirming approach in phoneme confusion network; Fourthly, we propose a framework for Mongolian keyword spotting system; and finally give detailed experimental results comparison. Experimental results show that the Mongolian Out-of-Vocabulary words can be effectively recognized by our phoneme confusion network based spotting method. The overall sys-tem performance can also be greatly improve by using the phoneme confu-sion matrix based calculation method.4. We propose a Mongolian grapheme-to-phoneme conversion (G2P) method. When detecting an Out-of-Vocabulary word, it firstly need to be represented as a phoneme string and then detected as a bunch of ordered characters. To perform this process, a Mongolian G2P system is essential. The written form and pronunciation of Mongolian are not one-to-one correspondence since the existence of the addition, losing and mutation of vowels and consonants. This brings certain difficulty for the Mongolian G2P work. To overcome this dif-ficulty, we propose both a rule-based Mongolian G2P conversion method and a statistic-based Mongolian G2P conversion method (Joint Sequence Model). Experimental results show that the statistic-based method is significantly better than the rule-based one. The word error rate is16.32%and the pho-neme error rate is3.37%for Mongolian G2P conversion system based on Joint Sequence Model, which satisfies the most application requirements.
Keywords/Search Tags:Mongolian, keyword spotting, speech recognition, stem, endingsuffix, lattice, confusion network, confidence measures, grapheme-to-phonemeconversion (G2P), Joint Sequence Model
PDF Full Text Request
Related items