Font Size: a A A

Study On DOA Estimation For A Target Speaker In Adverse Environments

Posted on:2022-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2518306725490434Subject:Acoustics
Abstract/Summary:PDF Full Text Request
Microphone array has been embedded on more and more terminal equipment and conference systems to improve the performance of speech front-end systems.One of the basic functions of microphone array is to estimate the direction-of-arrival(DOA)of the target speaker,and the estimation result can be used as an important prior information in speech enhancement,speech separation and other acoustic signal processing applications.Conventional signal-processing based DOA estimation algorithms can obtain accurate estimation results in ideal environment with low reverberation and weak noise.However,high reverberation and strong noise interference often exist in the actual acoustic environments,which greatly affect the robustness of conventional DOA estimation.This thesis focuses on achieving robust DOA estimation of target speakers in adverse environments,and the improvement of DOA estimation by extracting the direct-path desired speech signal is especially investigated.One of the common ideas to overcome the strong reverberation for accurate DOA estimation is to select time-frequency bins(TF bins)in the short time Fourier transform(STFT)domain that are dominated by the target speaker direct-path signal.This thesis introduces recent works on the extraction of such time-frequency bins,and the advantages and disadvantages of different DOA estimation algorithms utilizing different extraction strategies are discussed and analyzed through simulation.It can be verified that the accuracy of DOA estimation can be improved by selecting suitable extraction strategy of TF bins dominated by the target speaker direct-path signal.Inspired by "The Precedence Effect",it can be found that the time-frequency bins at the beginning of the speech signal have a higher proportion of the direct-path speech signal,and contain more accurate DOA information of the target speaker.Based on the observation,this thesis proposes a direct-path signal extraction algorithm based on the onset of speech signals.Combined with the modified weight prediction error algorithm,the extracted direct-path speech signal is utilized for the estimation of the target speaker's DOA.Simulations are implemented to compare the proposed algorithm with other algorithms based on the extraction of the time-frequency bin dominated by directpath speech signal,and the efficacy of the proposed algorithm is verified.In practice,there is usually strong non-stationary noise in addition to reverberation.Because conventional signal-processing based algorithms have difficulty in distinguishing speech signal and noise signal,it is often impossible to get accurate DOA estimation results in the presence of strong noise.Deep neural network technology,which has achieved great success in the field of image and natural language processing,can be used to solve this problem.Inspired by the success of the U-net structure in biomedical image segmentation,this thesis proposes a DOA estimation algorithm based on a multi-tasking U-net structure.The time-frequency bins dominated by the directpath desired speech signal in STFT domain are extracted by the neural network,based on which the conventional DOA estimation algorithm is implemented to estimate the target DOA.The performance of the proposed algorithm is verified by simulation and experiment.
Keywords/Search Tags:microphone array, speech source localization, direction of arrival, direct-path signal extraction, deep neural network
PDF Full Text Request
Related items