Font Size: a A A

Single-channel Speech Separation Based on Instantaneous Frequency

Posted on:2011-03-28Degree:Ph.DType:Dissertation
University:Carnegie Mellon UniversityCandidate:Gu, LingyunFull Text:PDF
GTID:1448390002956307Subject:Computer Science
Abstract/Summary:
While automatic speech recognition has become useful and convenient in daily life as well as an important enabler for other modern technologies, speech recognition accuracy is far from sufficient to guarantee a stable performance. It can be severely degraded when speech is subjected to additive noises. Though speech may encounter various types of noises, the work described in this dissertation concerns one of the most difficult problems in robust, speech recognition: corruption by an interfering speech signal with only a single channel of information. This problem is especially difficult because the acoustical characteristics of the desired speech signal are easily confused with those of interfering masking signal, and because useful information pertaining to the location of the sound sources is not available with only a single channel.;The goal of this dissertation is to recover the target component of speech mixed with interfering speech, and to improve the recognition accuracy that is obtained using the recovered speech signal. While we will accomplish this by combining several types of temporal features, the major novel approach will be to exploit instantaneous frequency to reveal the underlying harmonic structures of a complex auditory scene. The proposed algorithm extracts instantaneous frequency from each narrow-band frequency channel using short-time Fourier analysis. Pair-wise cross-channel correlations based on instantaneous frequency are obtained for each time frame, and clusters of frequency components that are believed to belong to a common source are initially identified on the basis of their mutual cross-correlation. In the dissertation, several methods are discussed in order to obtain better estimates of instantaneous frequency. Conventional and graph-cut algorithms are demonstrated to collect efficiently the pattern used to identify the underlying harmonic structures. As a complementary means to boost the final performance, a computationally efficient test for voicing is proposed. Speaker identification and pitch detection are also presented to refine further the final performance.;An estimate of the target signal is ultimately obtained by reconstruction using inverse short-time Fourier analysis based on selected components of the combined signals. The recognition accuracy obtained in situations of speech-on-speech masking is assessed and compared to the corresponding performance of speech recognition systems using previous approaches.
Keywords/Search Tags:Speech, Instantaneous frequency, Channel, Performance, Using
Related items