Noise Robust Speech Recognition Based On CNN-TDNN And Transfer Learning

Posted on:2020-06-13

Degree:Master

Type:Thesis

Country:China

Candidate:D Z Wang

Full Text:PDF

GTID:2438330590457609

Subject:Electronic and communication engineering

Abstract/Summary:

In recent years,due to the introduction of the acoustic model based on deep neural network,the speech recognition system achieves satisfactory recognition in a quiet environment.However,these systems still perform relatively poorly in environments with relatively low signal-noise ratios(such as noisy streets,shopping malls),and noise robust remains a key issue preventing large-scale applications of speech recognition systems.Based on this,This paper analyzes and summarizes the existing noise robust speech recognition methods,and mainly studies the acoustic modeling part of the backend.The innovations and specific work of the paper are summarized as follows:Firstly,DNN speech enhancement technology is adopted as the front end of the noise robust speech recognition system,by constructing a large number of noisy speech with multiple signal-to-noise ratios and sending it into the model together with the clean speech,Restricted Boltzmann Machine unsupervised pre-training and supervised finetuning of the back error propagation algorithm are performed on the model,Finally,decoding,voice waveform reconstruction to obtain a complete audible voice waveform file.Secondly,the Convolutional Neural Network and the Time Delay Neural Network are combined to construct an acoustic model for noise robust speech recognition,as follows: The semi-orthogonal low-rank matrix factorization is performed on the parameter matrix in the hidden layer of the time delay neural network,and then the CNN-TDNN model is optimized by combining with the convolutional neural network and adding time-restricted self-attention layers after the hidden layer.Use this model as the back end of the system.Finally,a method of DNN speech enhancement joint transfer learning training noise robust speech recognition acoustic model is proposed.In the process of training noise robust speech recognition acoustic model,one set of models is trained in the enhanced data set(as a student model),The other group trains in a clean data set(as a teacher model)and then uses the knowledge of transfer learning to let the student model learn the posterior probability distribution of the teacher model to maximize mutual information between the two.Therefore,the recognition rate and robust of the noise robust speech recognition system are finally improved.The experimental results show that the optimized CNN-TDNN model has better performance than the deep neural network,convolutional neural network,time delay neural network and CNN-TDNN model.The average recognition word error rate decreased by 11.76% compared to the baseline.Based on the experiment,the model trained by the transfer learning based on weight transfer has better robust,and the average recognition word error rate of the experimental test decreases by 0.37%.

Keywords/Search Tags:

Noise robust speech recognition, Speech enhancement, Time delay neural network, Convolutional neural network, Ransfer learning

Related items

1	Design And Implementation Of Noise Robust Speech Recognition Algorithm Based On Deep Learning
2	Research On Robust Speech Recognition In Noise Environment
3	Deep Learning For Robust Speech Recognition
4	Research On Fully Convolutional Neural Network Based Speech Enhancement Algorithm In The Time Domain
5	Design And Implementation Of Robust Speech Recognition System Based On Deep Neural Network
6	Research And Implementation Of Lightweight Speech Enhancement Algorithm For Air Control
7	Time Delay Neural Network Based Automatic Speech Recognition
8	Research On Speech Enhancement Algorithm Based On Convolutional Neural Network
9	Research And Implementation Of Speech Enhancement Based On Domain-Adversarial Training Of Neural Networks
10	Noise Robust Speech Recognition Research Based On Regression Deep Neural Network