Font Size: a A A

Noise Robust Speech Recognition Based On CNN-TDNN And Transfer Learning

Posted on:2020-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:D Z WangFull Text:PDF
GTID:2438330590457609Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,due to the introduction of the acoustic model based on deep neural network,the speech recognition system achieves satisfactory recognition in a quiet environment.However,these systems still perform relatively poorly in environments with relatively low signal-noise ratios(such as noisy streets,shopping malls),and noise robust remains a key issue preventing large-scale applications of speech recognition systems.Based on this,This paper analyzes and summarizes the existing noise robust speech recognition methods,and mainly studies the acoustic modeling part of the backend.The innovations and specific work of the paper are summarized as follows:Firstly,DNN speech enhancement technology is adopted as the front end of the noise robust speech recognition system,by constructing a large number of noisy speech with multiple signal-to-noise ratios and sending it into the model together with the clean speech,Restricted Boltzmann Machine unsupervised pre-training and supervised finetuning of the back error propagation algorithm are performed on the model,Finally,decoding,voice waveform reconstruction to obtain a complete audible voice waveform file.Secondly,the Convolutional Neural Network and the Time Delay Neural Network are combined to construct an acoustic model for noise robust speech recognition,as follows: The semi-orthogonal low-rank matrix factorization is performed on the parameter matrix in the hidden layer of the time delay neural network,and then the CNN-TDNN model is optimized by combining with the convolutional neural network and adding time-restricted self-attention layers after the hidden layer.Use this model as the back end of the system.Finally,a method of DNN speech enhancement joint transfer learning training noise robust speech recognition acoustic model is proposed.In the process of training noise robust speech recognition acoustic model,one set of models is trained in the enhanced data set(as a student model),The other group trains in a clean data set(as a teacher model)and then uses the knowledge of transfer learning to let the student model learn the posterior probability distribution of the teacher model to maximize mutual information between the two.Therefore,the recognition rate and robust of the noise robust speech recognition system are finally improved.The experimental results show that the optimized CNN-TDNN model has better performance than the deep neural network,convolutional neural network,time delay neural network and CNN-TDNN model.The average recognition word error rate decreased by 11.76% compared to the baseline.Based on the experiment,the model trained by the transfer learning based on weight transfer has better robust,and the average recognition word error rate of the experimental test decreases by 0.37%.
Keywords/Search Tags:Noise robust speech recognition, Speech enhancement, Time delay neural network, Convolutional neural network, Ransfer learning
PDF Full Text Request
Related items