Font Size: a A A

Monaural Speech Segregation Based On Computational Auditory Scene Analysis

Posted on:2013-03-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:L H ZhaoFull Text:PDF
GTID:1228330377951720Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In real-world environments, speech signals are usually corrupted by background noise originating from various noise sources. Noise can dramatically degrade speech quality and intelligibility, thereby causing significant harm to real application of speech technology. Thus, how to extract target speech from the mixed signal is a bottleneck problem that impedes the development of speech technology. According to how many microphones are available in system, the problem is called multi-channel, binaural, or monaural speech segregation problem. Especially, the monaural speech segregation problem is the most difficult.In this dissertation, our research is mainly about monaural speech segregation. With the help of computational auditory scene analysis (CASA), we carry out in-depth research on auditory segmentation, auditory grouping, and auditory feature classification, based on which, we propose some approaches to improve monaural speech segregation.The main contributions of this dissertation are illustrated as follows.1. We propose an auditory segmentation approach based on combined cues and regional energy distribution. In auditory segmentation, contiguous time-frequency (T-F) units that originate from a single source are merged into auditory segments. In our approach, the T-F units in high frequency range are merged into segments by using combined auditory cues (including cross-channel correlation, temporal continuity, and onset/offset). Besides, the regional energy distribution of mixed signal is employed to indicate the reliabilities of auditory cues in high frequency range, according to which, segmentation is performed. Experimental results show that our proposed approach can generate more reliable segments, and therefore improving the performance of segregation system.2. We propose an auditory grouping approach based on energy distribution across frequency channels. In auditory grouping, auditory segments from the same source are grouped into auditory streams, which corresponding to target speech or noise signal. In our approach, auditory segments are firstly grouped into auditory streams depending on periodicity and amplitude modulation (AM) principles. Then, based on the energy distribution of mixed signal across frequency channels, the T-F units that are considerably corrupted by noise in high frequency range are located, and they are removed from target streams. Experimental results show that our proposed approach can remove more noise dominant T-F units from target streams, thereby achieving better segregation results.3. We propose an auditory organization approach based on combined cues and energy distribution. In auditory organization, T-F units of mixed signal are allocated to auditory streams, which corresponding to target speech or noise signal. Auditory organization is mainly composed of two stages, auditory segmentation and grouping. We integrate the proposed auditory segmentation and grouping approaches together to improve auditory organization. Experimental results show that the segregation system based on our organization approach outperforms the previous systems, especially in high frequency range.4. We propose a monaural speech segregation approach based on harmonic and energy features. The method casts speech segregation as sound classification problem in T-F domain. First, the energy feature is employed to assist harmonic features. Then, as for the T-F units with obvious harmonicity and large energy, their features are replicated in the process of classifier training, and therefore, the classifier can better describe this sort of features. Experimental results show that our proposed method yields better segregation results compared with previous approach.
Keywords/Search Tags:Speech segregation, computational auditory scene analysis, auditorysegmentation, auditory grouping, auditory cues, energy distribution, high frequencyrange, sound classification, harmonic and energy features, classifier training
PDF Full Text Request
Related items