Font Size: a A A

Speech Recognition Front-End Processing Based On Deep Neural Network

Posted on:2020-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:M WenFull Text:PDF
GTID:2428330578482337Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of technology,the application scenarios of ASR(au?tomatic speech recognition)are becoming more widespread.However,practically,the environment in which speech recognition is located is sometimes extremely compli?cated,which leads to the significance of robust ASR.Recently,the rapid development of DNN(Deep Neural Networks)theory makes this method widely used.Nowadays,deep neural networks have been widely used in many tasks such as computer vision and natural language processing.This paper uses the method of deep neural network to explore two problems of the speech front end:speech separation and beamforming.Firstly,for the problem of speech separation,we use the method of masking value estimation based on deep neural network to introduce different speech features,opti?mization targets and loss functions,and compare them through experiments.In the meanwhile,we apply the loss function called deep cluster which is used in the multi-person speech separation problem to the music noise set,which has achieved good re-sults.This method improves the STOI and WER indicators by about 0.01 and 1%re-spectively.Secondly,for the problem of beamforming,We describe two methods based on deep neural networks:one is the GEV beamforming method,which takes advantage of the masking value estimation model used in the previous section;The other is the end-to-end beamforming method proposed in this paper.Furthermore,we determined two important parameters of the two methods through experiments.Finally,we verified the effectiveness of these two methods.The GEV beamforming method improves the WER index by about 0.5%and 0.6%on the 2-channel and 6-channel test sets,respectively,compared to the traditional method-based toolbox BeamformIt.The end-to-end beam-forming method increases the 0.2%and 0.2%respectively on the two data sets compared to the GEV beamforming method.
Keywords/Search Tags:Automatic Speech Recognition, Deep Neural Networks, Speech Front-end, Speech Separation, Beamforming, Deep Learning
PDF Full Text Request
Related items