Font Size: a A A

Research On Linguistic Model Of Uyghur Continuous Speech Recognition System

Posted on:2010-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2178360275997993Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The chief aim of this paper's work is to investigate the application of Linguistic Model techniques in the Uyghur continuous speech recognition system.The current study of Acoustic Model in speech recognition system is relatively mature, and the Acoustic Model has limited capacity in processing speech signal. There are many pieces of non-acoustic information have not yet been good used in speech recognition, such as syntax, semantics, context, etc., so there is not much more room for the advancement of the research of Acoustic Model in speech recognition system. However, studies on Linguistic Model still have much more room to improve. Therefore, this paper chooses Linguistic Modeling techniques as the main research direction and makes a special discussion on the application of linguistic model in continuous speech recognition system.Firstly, it combines with the characteristics of Uyghur, proposed a new principle of collecting corpus based on Confusable Set. According to the original recognition result, we compute all phonemes included in Insertion error, Deletion error and Substitute error. Then find some suitable sentences to recruit the corpus.Secondly, it uses CMU_Cam_Toolkit to deal with the new corpus, selected 5500 words which more than 10 high-frequency to construct a dictionary, trained and generated trigram statistical linguistic model.Thirdly, it makes a comparison of four smoothing methods, observed the impact to the perplexity respectively and ultimately selected good_turing method as our data smoothing methods to optimize the model.Finally, it uses the new trigram linguistic model instead of the original bigram in the Uighur language continuous speech recognition system, which developed by key laboratory of multilingual information technology of Xinjiang University in 2008.The experimental results indicate that the application of the new linguistic model has improved the performance of the Uyghur continuous speech recognition system. The sentence recognition rate was increased to 73.29% from 68.98%, the word recognition rate was increased to 96.27% from 94.65%.
Keywords/Search Tags:Uyghur, continuous speech recognition, Confusable Set, linguistic model, CMU_Cam_Toolki
PDF Full Text Request
Related items