Font Size: a A A

Far-field Speech Recognition Based On Multi-task Learning Using Phase And Reverberation Information

Posted on:2019-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:D B LiFull Text:PDF
GTID:2428330626952395Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The automatic speech recognition task has been significantly improved so far,and it has been able to achieve a better performance in a quieter environment,but the near-field speech recognition has higher requirements for the external environment,and still remains in the application of the actual environment.There are various problems.Far-field speech recognition results are still not ideal in the actual environment,mainly due to the influence of environmental factors in far-field speech recognition.In a closed environment,reverberation noise is the main cause of irrational recognition results.Therefore,the removal of reverberation can greatly prompt the speech recognition performance.With the wide application of deep learning,the application of deep neural networks in speech enhancement has played a key role.In this paper,we propose a method using a multi-objective neural network framework to learn clean speech acoustic features from phase information and amplitude information for far-field speech recognition.In previous studies,researchers found that amplitude information shows good performance in speech recognition,but phase information in speech is often ignored.In order to solve this problem,this paper proposes a multi-objective neural network based method to combine the two tasks of speech enhancement and feature enhancement to optimize performance.In the experiments conducted in this paper,only the phase information includes the conventional group delay system(MGDCC)and the channel information(PBSFVT)based on the phase domain source separation method.In this experiment,we used the data set provided by the REVERB 2014 Challenge to evaluate the proposed method.In the method proposed in this paper,the speech recognition task results are reduced from 26.57% word error rate(WER)to 23.68% using the traditional single-task neural network.In the method proposed in this paper,phase information as an important auxiliary feature in the multi-objective learning framework can enhance the recognition results.
Keywords/Search Tags:Far field, speech recognition, phase feature, multi-objective learning
PDF Full Text Request
Related items