Font Size: a A A

Research On Dbn-based Continuous Speech Recognition

Posted on:2011-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Y XueFull Text:PDF
GTID:2198330332478675Subject:Military Intelligence
Abstract/Summary:PDF Full Text Request
Hidden Markov model(HMM) is an easy and efficient statistical model, which has been successfully applied to speech recognition. But due to certain assumptions inconsistent with the fact, it is hard to describe the dynamics within speech for HMM. Dynamic Bayesian network (DBN) is a methodology which combines graphical theory, probabilistic theory and pattern recognition and so on. Due to interpretability, factorization, extensibility and the powerful algorithm for inference and learning, DBN can describe the dynamics within speech and is ideally suited for modeling temporal process. So DBN has become the hotspot in the speech signal processing field. Based on the research on the inference algorithm and learning in DBN, this dissertation proposes four DBN-based improved models continuous speech recognition. The contributions in this thesis can be summarized as follows.(1) As the number of insertions increases, caused by the bad discrimination of the phone-based DBN which uses the phone unit of big granularity, the subphone-based DBN is proposed. Firstly the phone unit is partitioned into subphones, and correspondingly the subphone variable and the subphone transition variable are added into the phone-based DBN. Then the variations of the dependency between the variables are analyzed and specified. Therefore the subphone-based DBN can model the detailed multiple-stage structure in the dynamic human speech process and represent the dynamics within the phone. The experiment results show the model can perform well in terms of the percentage correct and the accuracy.(2) When the vocabulary is large, the complexity of building the decision tree is high, and the model can't adapt to the vocabulary well. In order to solve these problems, a subphone-based DBN with the control layer modified is provided. An ending label is set for each word when building the dictionary and the parents of the word transition variable are altered. The end mark reduces the difference caused by the difference of phone number, and the complexity of building the decision tree for the word transition variable and the cost of reading the parameters when training and decoding are reduced accordingly. The proposed model can speed up training and decoding to a certain extent while recognition performance is not affected.(3) A new triphone-based DBN is given in this thesis to overcome the coarticulation phenomenon in speech. The previous phone variable and the next phone variable representing the context dependence are introduced into the subphone-based DBN with the control layer modified. Since the number of triphones is large, the triphones are clustered using decision trees based on pronunciation characteristics, which can make sure that the parameters of the triphone can be robustly estimated. As is shown in the experiment result, the recognition performance improvement for large vocabulary speech recognition is achieved.(4) As the performance of the model is decreased when training environment and test environment mismatch, the DBN models incorporating a discrete noise variable are presented. To improve the model robustness and adaptability to the database under various SNR levels, a discrete noise variable, which classifies the training set under different noise conditions implicitly, is added into the DBN framework. The results demonstrate that the model trained on the database under different SNR levels can improve the performance.
Keywords/Search Tags:Continuous speech recognition, Dynamic Bayesian Network, Subphone-based DBN, Subphone-based DBN with the control layer modified, Triphone-based DBN, Discrete noise variable, Hidden Markov Model
PDF Full Text Request
Related items