Speech Recognition Front-End Processing Based On Deep Neural Network

Posted on:2020-11-11

Degree:Master

Type:Thesis

Country:China

Candidate:M Wen

Full Text:PDF

GTID:2428330578482337

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

With the rapid development of technology,the application scenarios of ASR(au?tomatic speech recognition)are becoming more widespread.However,practically,the environment in which speech recognition is located is sometimes extremely compli?cated,which leads to the significance of robust ASR.Recently,the rapid development of DNN(Deep Neural Networks)theory makes this method widely used.Nowadays,deep neural networks have been widely used in many tasks such as computer vision and natural language processing.This paper uses the method of deep neural network to explore two problems of the speech front end:speech separation and beamforming.Firstly,for the problem of speech separation,we use the method of masking value estimation based on deep neural network to introduce different speech features,opti?mization targets and loss functions,and compare them through experiments.In the meanwhile,we apply the loss function called deep cluster which is used in the multi-person speech separation problem to the music noise set,which has achieved good re-sults.This method improves the STOI and WER indicators by about 0.01 and 1%re-spectively.Secondly,for the problem of beamforming,We describe two methods based on deep neural networks:one is the GEV beamforming method,which takes advantage of the masking value estimation model used in the previous section;The other is the end-to-end beamforming method proposed in this paper.Furthermore,we determined two important parameters of the two methods through experiments.Finally,we verified the effectiveness of these two methods.The GEV beamforming method improves the WER index by about 0.5%and 0.6%on the 2-channel and 6-channel test sets,respectively,compared to the traditional method-based toolbox BeamformIt.The end-to-end beam-forming method increases the 0.2%and 0.2%respectively on the two data sets compared to the GEV beamforming method.

Keywords/Search Tags:

Automatic Speech Recognition, Deep Neural Networks, Speech Front-end, Speech Separation, Beamforming, Deep Learning

PDF Full Text Request

Related items

1	Research On Speech Separation And Recognition Based On Deep Learning
2	Short Speech Speaker Recognition Method Based On Deep Learning And Its Application In Speech Separation
3	Research On Auto-regressive Deep Neural Networks' Based Monaural Speech Separation
4	Speech Separation Based On Deep Learning
5	Single Channel Speech Separation Methods Based On Deep Neural Network
6	Research On Deep Neural Networks Based Models For Speech Recognition
7	The Research Of Key Techniques Of Speech Separation And Speech Recognition
8	Research On Algorithms Of Speech Sentence Recognition Based On Deep Learning
9	Research On Deep Learning Based Far-Filed Speech Recognition
10	Research On Speech Preprocessing Of Speech Recognition For Multi-talker Conversations In Complex Acoustic Environments