Font Size: a A A

Measured Based On The Low Snr Of The Microphone Array Speech Source To The Technical Study

Posted on:2011-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y H HeFull Text:PDF
GTID:2208360302498390Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years, with the development of the speech signal processing technology, the voice sources'direction finding has become a hot and difficult study in the microphone arrays signal processing. The technology of direction finding based on microphone arrays is different from the traditional arrays signal processing. Generally, in the one-dimensional direction finding system, it gets the directions of the voice sources based on the TDOA (Time Difference of Arrival) between the two microphone arrays. It is clear that the accuracy of time delay estimation is the key technology of this system. Currently, there are many time delay estimation methods, the traditional time delay estimation methods such as the CC (the Cross Correlation), the GCC (the Generalized Cross Correlation) are easy to be vulnerable to noise, ineffective in the low SNR, and mostly only fit for single-source situation.In response to these shortcomings, this study proposes a new method of time delay estimation, the algorithm transforms the speech from time domain to time-frequency domain, bringing the broadband, multi-voice-sources into the superposition of many single time-frequency points. The speech is sparse in time-frequency domain, so the energy of voice signal only concentrates on some narrow frequency, and the energy of voice sources are not overlap each other. With the characteristics of the speech signals, taking the time delay as the basis of the classification, this method does power weighted clustering analysis for each frequency point. The energy concentrating on the narrow-band frequency energy is so large that it is easy to be clustered, and we can get the time delay estimation. Because of the difference of each voice source, the delays corresponding to the energy peaks of the multi-voice-sources are also different, so we can get the time delay estimation of every voice source. That's to say, this method is suitable to get the time delay estimations of multi-voice-sources. In addition, the voice sources'energy is mainly concentrated in certain narrow band, so even if the SNR is very low, the SNR of the frequency points within the narrow-band frequencies is still so big that the energy of the voice source is able to be clustered, and we can get the time delay estimation. As a result, this method is suitable to get the time delay estimation correctly in the low SNR. And in practice, in order to obtain a high angle resolution, it generally employs the microphone arrays of large aperture. With increasing frequency, the phase from the cross power spectrum will appear ambiguity, leading time delay estimation not uniquely determined and some spurious peaks in clusters. This paper introduces the method of tagging frequency points successively to eliminate the spurious peaks because of phase ambiguity. As a result, the peaks are one-to-one correspondence to the voice sources.In this paper, we describe the principles and steps of the algorithm, and simulate the algorithm with some simulation signals to analysis the impact of various parameters on the algorithm. Simulation results show that with the appropriate energy threshold, even in the low SNR, this algorithm can also draw more accurate time delay estimation of multi-voice-sources, and gets the direction of multi-voice-sources estimation. The final simulation results on recorded multi-voice-sources in the experimental environment show that this algorithm is also feasible in the actual noise environment.
Keywords/Search Tags:microphone arrays, direction of arrival of multi-voice-sources, time delay estimation, weighted clustering analysis, direction deambiguity
PDF Full Text Request
Related items