Font Size: a A A

Research On Speech Separation Algorithm Base On Microphonearray

Posted on:2020-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:C Y DengFull Text:PDF
GTID:2428330575956400Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech is the most convenient and the fastest form of human communication.With the advent of artificial intelligence society,speech interaction is also the first choice for adult machine interaction.However,in real life,the background environment of speech is often complex and has a negative impact on speech quality.We often need to extract the speech we are interested in from the complex noise background and keep the fidelity of the speech as much as possible.At present,researchers have achieved more significant results,but still face the problem that the robustness of the algorithm is not strong enough and the perceived quality of the target speech is not high enough.In this thesis,we will do some in-depth research on extracting single-target speech from the complex noise background and multi-speaker separation.Firstly,the speech separation of single target source in complex noise background is studied.In the presence of noise,especially in low SNR scenarios,the performance of generalized cross-correlation(GCC-PHAT)based on phase transformation is seriously degraded,which seriously affects the separation performance of generalized cross-correlation-nonnegative matrix factorization(GCC-NMF).In response to this situation,this paper proposes a new calibration function—mask-weighted GCC-PHAT(MWGCC-PHAT)and mask-weighted GCC-NMF(MWGCC-NMF),which are based on the ideal binary masking(IBM)learned by the Bidirectional Long and Short Memory Network(BLSTM).Experiments show that MWGCC-NMF can separate low signal-to-noise ratio(SNR)mixed speech with GCC-PHAT separation failure.Overall performance compared to GCC-NMF,SDR increased by 25.44%,PESQ increased by 14.75%,OPS increased by 9.80%,and SNR increased by 6.38%.It is proved that MWGCC-PHAT has better robustness and performance.Secondly,speech separation of multi-speakers is discussed.Because GCC-NMF can't separate the defects of mirror-symmetric or approximate symmetry of different sources,sensitive to position information,etc.,a GCC-NMF based on Logistics regression selection strategy is proposed,which enriches the space of circular six-microphone array.The information and the GCC-NMF calculations are small and flexible.The experimental results show that the GCC-NMF based on the logistic regression selection strategy has better performance than the worst-pair microphone pair GCC-NMF,whether it is the simulated microphone array data or the real microphone data.The average OPS of the GCC-NMF based on Logistic regression selection is the highest compared to the worst performing microphone pair in the microphone array by 27.47.It is proved that the GCC-NMF of the logistic regression selection strategy greatly improves the spatial robustness and practicability of GCC-NMF.
Keywords/Search Tags:speech separation, IBM, MWGCC-NMF, Logistic regression, selection strategy
PDF Full Text Request
Related items