Font Size: a A A

Researching Of The Mogolian Language Model Based On Speech Recognition

Posted on:2008-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:X AiFull Text:PDF
GTID:2178360215991328Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Language model (LM) is a mathematics model that describesinherent disciplines of natural language. It is occupying the extremelyimportant status in speech recognition. A robust language model cangreatly improve the performance of LVSRS, but also because thatlanguage modeling technology is relatively immature compared toacoustic modeling, and great performance improvement is still possiblefor further research in language modeling.Mongolian is a worldwide influential national language and bewidely used in the Inner Mongolia Autonomous Region and minorityareas of Xinjiang. The technology of Mongolian speech recognition hasthe certain development in the integration and application technology ofthe Mongolian informational processing system. With these researchingresults of the Mongolian informational processing technology, this thesismakes further exploration to improve the recognition rate of theMongolian speech recognition system by enhancing the performance ofthe Mongolian language model.This paper first introduces the knowledge of the selection andprocessing of a large size corpus in great detail and constructs theMongolian trigram language model by using constructing model toolHLM. From the experiment we got that when using Mongolian trigramlanguage model which using Good-Turning discounting and Katz backoff data smoothing, and training corpus which has almost ten thousandsentences oriented Mongolian daily dialogue the recognition rate of thesystem has reached to 93.05%. When using the same acoustic model andtest data, the rate is higher than the system using traditional rule-basedlanguage model and Mongolian bigram language model. The thesis alsoputs forward the Mongolian word clustering algorithm that based onevaluation function with the Mongolian morphology method andconstructs the class-based language model by the algorithm. The thesisstudies the hybrid language model based on the summary of advantagesand disadvantages of the language model based on only words or wordsclasses. The experiment result indicates that the hybrid language modelhas better performances and satisfying complexity.
Keywords/Search Tags:Statistical Language Model, Continuous Mongolian Speech Recognition System, Corpus, Complexity, Data Smoothing, Word Clustering, Hybrid Language Model
PDF Full Text Request
Related items