Font Size: a A A

Research On Robust End-to-end Acoustic Model Of Complex Scene

Posted on:2021-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:S X TianFull Text:PDF
GTID:2518306197999769Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,speech recognition system has made a breakthrough in noise-free environment,but there is still a serious degradation of recognition performance in noisy environment.Therefore,the research of robust speech recognition is widely concerned,such as robust speech feature extraction,robust speech enhancement and robust acoustic model.Robust acoustic model is the key technology to improve the anti-interference ability of speech recognition system.However,there are still many technical problems to be overcome in the practical application of robust acoustic model,such as the generalization ability of unknown noise interference and the model design for complex application scenarios.In order to solve these problems,this paper starts from the end-to-end acoustic model and deep neural network and studies the robust end to end acoustic model.The paper describes details of CNN-CTC framework and verifies the effectiveness of the framework,which is combined convolutional neural network and connectionist temporal classification algorithm.To improve the generalization capability of the model,the idea of parameter sharing between clean speeches and noisy ones is suggested,combining the methods of teacher-student training and multi-conditional training.Furthermore,the deep similarity network is also proposed to enclose the feature results generated from clean and noisy samples for further recognition.Several types of noises and speech databases are employed to evaluate the model performance.The DCASE development set contains background noise of ten different scenes,which is approximated to the complex scenes in this paper.The idea of end-to-end robust acoustic model based on the front-end classification is proposed.The front-end model implements acoutstic scene classification for noisy speech and inputs speech features to the corresponding robust acoustic model according to the classification results.The classifier includes stage of feature design,feature extraction and classification.In the feature design phase,noise feature based a long-term energy is proposed and verified the effectiveness in the 2018 DCASE development set.The classification accuracy is about 14% higher than that of the baseline model.In the process of feature extraction and classification,a decomposed variational auto-encoder network is proposed to extract noise features from noisy speech,and then a feedforward neural network is built to classify.Finally,in order to reduce the cost of time and computing resources,a low rank transfer learning method is proposed.Combining the idea of neural network compression with transfer learning,we add two low rank networks to learn target task on the basis of the source domain model,so that the parameters can be reused.
Keywords/Search Tags:Connectionist Temporal Classification, robust acoustic model, neural network, transfer learning
PDF Full Text Request
Related items