Far-field Speech Recognition Based On Multi-task Learning Using Phase And Reverberation Information

Posted on:2019-08-11

Degree:Master

Type:Thesis

Country:China

Candidate:D B Li

Full Text:PDF

GTID:2428330626952395

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The automatic speech recognition task has been significantly improved so far,and it has been able to achieve a better performance in a quieter environment,but the near-field speech recognition has higher requirements for the external environment,and still remains in the application of the actual environment.There are various problems.Far-field speech recognition results are still not ideal in the actual environment,mainly due to the influence of environmental factors in far-field speech recognition.In a closed environment,reverberation noise is the main cause of irrational recognition results.Therefore,the removal of reverberation can greatly prompt the speech recognition performance.With the wide application of deep learning,the application of deep neural networks in speech enhancement has played a key role.In this paper,we propose a method using a multi-objective neural network framework to learn clean speech acoustic features from phase information and amplitude information for far-field speech recognition.In previous studies,researchers found that amplitude information shows good performance in speech recognition,but phase information in speech is often ignored.In order to solve this problem,this paper proposes a multi-objective neural network based method to combine the two tasks of speech enhancement and feature enhancement to optimize performance.In the experiments conducted in this paper,only the phase information includes the conventional group delay system(MGDCC)and the channel information(PBSFVT)based on the phase domain source separation method.In this experiment,we used the data set provided by the REVERB 2014 Challenge to evaluate the proposed method.In the method proposed in this paper,the speech recognition task results are reduced from 26.57% word error rate(WER)to 23.68% using the traditional single-task neural network.In the method proposed in this paper,phase information as an important auxiliary feature in the multi-objective learning framework can enhance the recognition results.

Keywords/Search Tags:

Far field, speech recognition, phase feature, multi-objective learning

PDF Full Text Request

Related items

1	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
2	Research On Speech Enhancement Technology Based On Multi-Objective Learning And Integration
3	Muti Objective Learning And Ensembing For Deep Neura Network Based Speech Enhancement
4	Speech Emotion Recognition Based On Deep Learning And Multi-Feature Fusion
5	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
6	Research On Key Techniques Of Speech Emotion Recognition
7	Research On Algorithms Of Speech Emotion Recognition Based On Feature Learning
8	Multi-dimensional Speech Feature Parameter Visualization And Its Application In Speech Recognition Research
9	Research On Speech Emotion Recognition Technology Based On Feature Research And Multi-Output BLSTM
10	Speech Emotion Recognition Based On Multi-feature Combination And Attention Mechanism