Font Size: a A A

Continuous Chinese Speech Recognition Algorithm Based On BN Features And CNN

Posted on:2018-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y W SuFull Text:PDF
GTID:2428330572964776Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Speech recognition provides a convenient way of human-computer interaction,in especial,the continuous Chinese speech recognition is a very challenging subject.In this paper,the feature extraction method and the acoustic model structure based on deep learning are studied.The main research work includes the following:(1)Based on Kaldi speech recognition toolkit,we build the speech recognition systems.We analyze and summarize the problems of prior speech recognition algorithms,based on which,the research problems of this paper are introduced.(2)We propose a speech recognition algorithm combining BN features(bottleneck features)and CNN(convolutional neural network).Compared with traditional acoustic features,BN features extracted by the tandem DNN(deep neural network)model can reflect the dynamic characteristics of speech signal more precisely,and can improve the speech recognition accuracy.The construction of the acoustic model by CNN instead of DNN can reduce the number of parameters of the model while guaranteeing recognition performance.The experimental results show that,the phoneme accuracy is improved from 92.76%to 94.46%on continuous Chinese database thchs30,and the phoneme accuracy is improved from 98.76%to 99.87%on digital database digit315 recorded by the speech group.(3)In order to further improve recognition performance of the system,we propose a method to optimize the BN features extraction network by using the deep belief network.The tandem DNN model is pre-trained with an unsupervised approach using a deep belief network,and the BN features are extracted by supervised fine-tuning.This method improves the recognition accuracy.(4)We propose a method to further optimize the BN features extraction network by down-sampling.This method down-samples the expanding and splicing part of frame in the tandem DNN model,which can reduce the complexity of model and enhance the training speed.Compared with the model without down-sampling,the final recognition accuracy is improved.Through the above two methods to optimize BN features extraction network,the experimental results show that,the phoneme accuracy is improved from 94.46%to 94.85%on continuous Chinese database thchs30,and the number of parameters of the BN features extraction network is reduced from 11.23 million to 10.03 million.
Keywords/Search Tags:BN features, CNN, continuous Chinese speech recognition, Kaldi
PDF Full Text Request
Related items