Continuous Chinese Speech Recognition Algorithm Based On BN Features And CNN

Posted on:2018-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Su

Full Text:PDF

GTID:2428330572964776

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

Speech recognition provides a convenient way of human-computer interaction,in especial,the continuous Chinese speech recognition is a very challenging subject.In this paper,the feature extraction method and the acoustic model structure based on deep learning are studied.The main research work includes the following:(1)Based on Kaldi speech recognition toolkit,we build the speech recognition systems.We analyze and summarize the problems of prior speech recognition algorithms,based on which,the research problems of this paper are introduced.(2)We propose a speech recognition algorithm combining BN features(bottleneck features)and CNN(convolutional neural network).Compared with traditional acoustic features,BN features extracted by the tandem DNN(deep neural network)model can reflect the dynamic characteristics of speech signal more precisely,and can improve the speech recognition accuracy.The construction of the acoustic model by CNN instead of DNN can reduce the number of parameters of the model while guaranteeing recognition performance.The experimental results show that,the phoneme accuracy is improved from 92.76%to 94.46%on continuous Chinese database thchs30,and the phoneme accuracy is improved from 98.76%to 99.87%on digital database digit315 recorded by the speech group.(3)In order to further improve recognition performance of the system,we propose a method to optimize the BN features extraction network by using the deep belief network.The tandem DNN model is pre-trained with an unsupervised approach using a deep belief network,and the BN features are extracted by supervised fine-tuning.This method improves the recognition accuracy.(4)We propose a method to further optimize the BN features extraction network by down-sampling.This method down-samples the expanding and splicing part of frame in the tandem DNN model,which can reduce the complexity of model and enhance the training speed.Compared with the model without down-sampling,the final recognition accuracy is improved.Through the above two methods to optimize BN features extraction network,the experimental results show that,the phoneme accuracy is improved from 94.46%to 94.85%on continuous Chinese database thchs30,and the number of parameters of the BN features extraction network is reduced from 11.23 million to 10.03 million.

Keywords/Search Tags:

BN features, CNN, continuous Chinese speech recognition, Kaldi

PDF Full Text Request

Related items

1	Research On Chinese Speech Recognition Based On Kaldi
2	Research Of Speech Recognition Based On Kaldi
3	Research On Speech Recognition Based On Kaldi
4	Chinese Speech Recognition Technology And Its Application In Speech Separation
5	Noise Environment Of Chinese Continuous Speech Recognition Technology Research,
6	Syllable-based Method Of Tone Recognition For Chinese Continuous Speech
7	Chinese Continuous Speech Recognition Based On Sphinx
8	Luo Ping Dialect Speech Recognition Research Based On Kaldi
9	Research And Implementation On Speaker-independent Chinese Continuous Digit Speech Recognition System
10	The Implementation And Optimization Of Speech Recognition System Based On Kaldi