Font Size: a A A

Research And Verification Of Monaural Speech Segregation Based On Computational Auditory Scene Analysis And Deep Neural Network

Posted on:2022-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:R F ShenFull Text:PDF
GTID:2518306731479764Subject:Vehicle Engineering
Abstract/Summary:PDF Full Text Request
Human-computer voice interaction enters people's daily life with the huge advantage of liberating human hands.In daily communication,humans can easily capture the signal they want to receive from the noisy environment.But for the machine,it must provide an effective method to capture target voice signals from a complex acoustic environment.In practical applications,although monaural sound provides less effective information,it is still used in many fields due to the characteristics of less equipment and low cost.Therefore,the problem of monaural speech separation has become the focus of researchers.So this article mainly focuses on the monaural speech separation problem.The main research contents are as follows:(1)Computational Auditory Scene Analysis is based on the auditory perception characteristics of the human ear.It uses computers to achieve speech separation.However,as the interference energy in the mixed signal increases,the pitch tracking effect in the auditory recombination process decreases,which impairs the system's speech separation performance.This paper proposes a time-frequency unit marking method based on multi-pitch estimation.Based on the preliminary unit labeling results obtained by auditory segmentation and initial recombination,the pitch of the target signal is estimated.Then the time-frequency unit labeling results are corrected accordingly.Then considering that the interference signal may contain harmonic signals,the pitch in the mixed signal is estimated again for the second time.Finally the labeling result is corrected based on the two pitch estimation results and the continuity of signal.The proposed method can improve the accuracy of the target pitch estimation in the mixed signal.And then improving the system's speech separation performance.(2)The feature selection based on Group Lasso is studied.As the types of speech features increase and the dimensionality increases,it becomes difficult to select the appropriate features.If a single feature is used in the model,the speech representation will be incomplete;if multiple features are used,it may lead to dimensional disasters.In addition,there may be redundancy between different features,which will not only increase the calculation cost,but also reduce the model's accuracy.Therefore,this article analyzes the four feature: AMS,RASTA-PLP,MFCC and MRCG that are currently widely used.Perform feature selection based on Group Lasso,and selects the complementary feature group of AMS + MFCC + MRCG according to the calculation results.(3)The effectiveness of the complementary feature group is verified.In view of the following situations: noise matching or un-matching,with or without delta feature,and using IRM or IBM training target.Use deep neural network to train the feature groups,single features and other feature groups,and then conduct comparative analysis.The comparison results show that in different situations,the performance of the complementary feature group AMS + MFCC + MRCG is better than a single feature.And the separation performance of DNN is improved significantly.
Keywords/Search Tags:Computational auditory scene analysis, Feature selection, Deep neural network, Monaural speech segregation
PDF Full Text Request
Related items