Deep Neural Networb For Chinese Speech Recognition

Posted on:2016-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:D L Zhang

Full Text:PDF

GTID:2308330467496807

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

GMM-HMM acoustic model in speech recognition technology has made a huge success, but with the increasing amount of speech database, the complexity of data is growing, the training time is getting longer and longer. And GMM is a shallow model, it is lack of capability for building the model with the complex database, so we need to find a more powerful way to build the acoustic model, and the deep neural network model have this condition and capability in the building the acoustic models. Although Mel Frequency Cepstral Coefficients as the characteristics of the widely used in speech recognition processTo reduce dimension or decorrelation may loss a part of voice information, so we can use the output of Mel filter as the acoustic features, which is called Filter-bank, Fbank features contain more voice information for training.This paper completed a deep neural network model for Chinese speech recognition system. The main work are as follows:(1)Establishing the DNN model environment Kaldi.installing CUDA and using GPU to accelerate training process;(2)Training the monophone model, optimizating and training the triphone model based on the monophone model, and then get the based triphone model for DNN training;(3)Training and testing features between MFCC and Fbank, optimizing the Fbank feature for DNN training, analysis and compare the experimental results.The results show that DNN model gives better results than the GMM model, with WER(word error rate) decreasing from28.02%to15.97%(12.05%absolute and43%relative); Experiment on Kaldi speech recognition system shows the WER is decreasing from15.97%to15.11%(0.86%absolute and5.38%relative) while using Fbank features compared with MFCC features.The best result of WER is14.87%while the Fbank feature increases density for trainning the DNN model; The best result of WER is14.33%while enhancing the frames with the same filters. In conclusion, DNN model improves the Chinese speech recognition system effectively; Fbank feature is more suitable for DNN model than MFCC; Enhancing the density of filters and frames can enhance the Chinese speech recogination within a given range.

Keywords/Search Tags:

Speech recognition, Hidden Markov Model, Deep Neural Network, Acoustic feature

PDF Full Text Request

Related items

1	A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition
2	Research Of The Speech Recognition Technology Based On HMM
3	Research On Chinese Continuous Speech Recognition In Noisy Environment
4	Research On And Implementation Of Continuous Speech Recognition System
5	Study Of Speech Recognition Algorithm Based On HMM And Neural Network
6	The Study Of Feature Extraction And Acoustic Modeling In Speech Recognition System
7	Research Of Speech Recognition Based On Mixture Feature Extraction And Improved Continuous Hidden Markov Model
8	Research Of Speech Recognition Based On Hidden Markov Model And Neural Network
9	Research On Neural Network-based Acoustic Modeling For Speech Synthesis
10	Study Of Speech Recognition Algorithm Based On HMM And Artificial Neural Network