Font Size: a A A

Research On Acoustic Modeling For Speech Recognition Based On Deep Neural Networks

Posted on:2015-09-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:P ZhouFull Text:PDF
GTID:1228330467974885Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The ultimate goal of speech recognition is making communication between man and machine as convenient as man-man communication. Performance of acoustic model directly related to the accuracy of the whole speech recognizer. Hidden Markov mod-els that adopt Gaussian mixture model as the state emission probability density func-tion have been playing a dominant role in speech acoustic modeling for the last sev-eral decades. GMM-HMM acoustic model has attracted a lot research attention for its complete theory, including discriminative training and adaptation technique and mature training tools. Recently, speech recognition has made huge progress as the appearance of the new learning methods, namely deep learning theory, in machine learning field. The context dependent deep neural network hidden Markov model (DNN-HMM) hy-brid acoustic modeling approach are rapidly occupying the position of GMM-HMM and becoming state-of-the-art acoustic modeling method. The algorithms related to DNN-HMM are also attracting extensive research attention from speech recognition field. Being in such circumstance, this thesis concentrates on deep neural network acoustic modeling and its application in speech recognition.Firstly, in order to improve the acoustic modeling ability of DNN, we investigation it from both feather domain and model domain. In the feather aspect, as an indirectly acoustic modeling method, neural network is viewed as a pre-processor of feature, and discriminative posterior feature is extracted and used in Tandem manner for GMM-HMM. We propose an improved method for Tandem system based on competing in-formation. Neural network is trained using competing fragment selected from lattice. System performance is improved after combine competing network with conventional positive network. Then, in the directly acoustic modeling aspect, we propose a new feature fusion method that uses the deep neural network architecture and its learning methods to combine information in the intermediate hidden layer of DNN. The higher level representations from individual raw feature in the intermediate layer of DNN are combined and continue to learning the following feature representation in higher lay-ers. This method utilizes the complementary information from multiple feature streams under deep neural network framework and system performance is improved.Secondly, we go deep into the training efficiency in the practical application of deep neural network acoustic model for the sake of applicability of DNN in large scale tasks. Huge amount of training data in real speech recognition tasks, large number of weight parameter and the training algorithms make training efficiency of DNN become the bottle neck in real speech recognition systems. We analyze the learning algorithm of deep neural network and provide a new joint acoustic modeling approach with multiple deep neural networks (mDNN) to solve the low efficiency. After training data cluster-ing, we can separately train multiple deep neural networks in parallel to model each of the cluster data independently. Comparing to the conventional single DNN modeling method, this acoustic modeling approach significantly promote the training efficiency under the cross entropy training criterion, which will be meaningful in many practical research applications.Finally, for the proposed mDNN joint modeling scheme, we make further inves-tigation. To verify the feasibility of the mDNN for acoustic modeling as well as to alleviate the performance degradation, we use mDNN to perform sequence level dis-criminative training. Sequence level discriminative criterion can be regarded as a joint optimization of the multiple deep neural networks. Based on the mDNN structure, weights update formulas are derived for MMI objective function. We also provide a partially parallel training method for mDNN sequence level training. Experimental re-sults reveal that mDNN reaches nearly the same performance as the single DNN after joint sequence training, with7times of speedup and1.5times of speed up for cross en-tropy and MMI criterion respectively, which reveals that multiple deep neural networks acoustic modeling approach is indeed an effective acoustic modeling technique.
Keywords/Search Tags:speech recognition, mutiple deep neural networks, competing information, information combination, parallel training, sequence training
PDF Full Text Request
Related items