Font Size: a A A

Deep Neural Networb For Chinese Speech Recognition

Posted on:2016-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:D L ZhangFull Text:PDF
GTID:2308330467496807Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
GMM-HMM acoustic model in speech recognition technology has made a huge success, but with the increasing amount of speech database, the complexity of data is growing, the training time is getting longer and longer. And GMM is a shallow model, it is lack of capability for building the model with the complex database, so we need to find a more powerful way to build the acoustic model, and the deep neural network model have this condition and capability in the building the acoustic models. Although Mel Frequency Cepstral Coefficients as the characteristics of the widely used in speech recognition processTo reduce dimension or decorrelation may loss a part of voice information, so we can use the output of Mel filter as the acoustic features, which is called Filter-bank, Fbank features contain more voice information for training.This paper completed a deep neural network model for Chinese speech recognition system. The main work are as follows:(1)Establishing the DNN model environment Kaldi.installing CUDA and using GPU to accelerate training process;(2)Training the monophone model, optimizating and training the triphone model based on the monophone model, and then get the based triphone model for DNN training;(3)Training and testing features between MFCC and Fbank, optimizing the Fbank feature for DNN training, analysis and compare the experimental results.The results show that DNN model gives better results than the GMM model, with WER(word error rate) decreasing from28.02%to15.97%(12.05%absolute and43%relative); Experiment on Kaldi speech recognition system shows the WER is decreasing from15.97%to15.11%(0.86%absolute and5.38%relative) while using Fbank features compared with MFCC features.The best result of WER is14.87%while the Fbank feature increases density for trainning the DNN model; The best result of WER is14.33%while enhancing the frames with the same filters. In conclusion, DNN model improves the Chinese speech recognition system effectively; Fbank feature is more suitable for DNN model than MFCC; Enhancing the density of filters and frames can enhance the Chinese speech recogination within a given range.
Keywords/Search Tags:Speech recognition, Hidden Markov Model, Deep Neural Network, Acoustic feature
PDF Full Text Request
Related items