Font Size: a A A

Acoustic Model Of Chinese Speech Recognition Based On DNN

Posted on:2016-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:J S ZhuFull Text:PDF
GTID:2308330479489918Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The GMM-HMM method has been the dominant modeling technology in the acoustic model of speech recognition. However, the performance of Chinese continuous speech recognition systems in real scenarios is still unsatis factory. On the other hand, artificial neural network is also used in the acoustic model in the early stage, but in practical applications, its performance is poorer than Gaussian mixture model. Deep learning is a new machine learning technology that has been received widespread attention in academia in recent year. This technology mainly focuses on the modeling and learning problem of deep neural network(DNN) that has multilayer and has strong modeling capability. The technology of deep neural network has been successfully applied to the problems related to speech, text and image data.We firstly construct a GMM-HMM context-dependent baseline speech recognition system with the conbination of a 3-gram language model. We also analyzed the related problems of the training process. And then we construct two types of Chinese speech recognition system. One of them exploits the deep neural networks and hidden Markov model. In this architecture, the DNN is used to estimate state posterior probability of HMM and we use the iteraterly greedy algorithm to train the accoutic model. This algorithm make use of the large amount of unlabeled training data and the pre-training algorithm used in the process can aid in optimization of cost function and reduce generalization error. Another technology we used in our speech recognition system is the Tandem technology which is also based on DNN. This technology uses the DNN to create the new feature and than sends it to the Chinese speech recoginion system.Experimental results show that the speech recognition system based on deep neural network achieved a higher recognition rate, which is better than the traditional context-dependent GMM-HMM model but also required more time to train the system. The deep neural network uses the high-dimentional and multi-framed feature to improve the regnition rate.
Keywords/Search Tags:speech recognition, acoustic model, deep neural networks
PDF Full Text Request
Related items