Font Size: a A A

Single-channel Speech Separation Based On Computational Auditory Scene Analysis

Posted on:2015-08-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q HuFull Text:PDF
GTID:1488304310496424Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
ABSTRACT:Single-channel speech separation (SCSS) aims at separating the underly-ing sources when only one single recording of their linear mixture is available. Compu-tational auditory scene analysis (CASA) is a new approach which can be used to address this problem. In practice, CASA achieves the separation by seeking perceptually moti-vated features in the observed signal, and it does not make too many assumptions on the characteristics of intrusions.Currently, there are two branches in CASA:1) Date-driven CASA;2) Model-based CASA. A major distinction between these two is that the former relates to the rapid and automatic process (bottom-up), whereas the latter is associated with the generally slow and consciously deliberative process (top-down). In a natural environment, the speed of responses is critical to the survival of an organism. This implies that the problem of speech separation is mainly based on a feed-forward processing, since there is little time left for iterative feedback. Thus, in this thesis, we concentrate on the data-driven CASA and try to investigate some aspects of this issue in depth. The main contributions and contents of this dissertation are as follows:1. Due to the low resolution of the short-duration amplitude modulation spectrum (AMS), we have proposed a system based on the re-assignment method to achieve co-channel speech separation. This method utilizes the variable low-pass filter to extract the band-wise amplitude modulation (AM) signals; then, the re-assignment method is used to alleviate the conflicts of time-frequency resolution in STFT (Short-time Fourier Transform). This method increases the energy concentration of speech components by re-allocating energy distribution in the time-frequency joint plane. Systematic evaluation shows that co-channel speech separation based on the re-assigned AMS provides better performances than the standard methods.2. Inspired by Schroeder histogram, Goldsten's optimum pitch theory and Meddis's Correlogram, we have proposed a new multi-pitch detection method based on "Gaussgram" which is derived from correlogram by using Gauss functions with different bandwidths. The "Gaussgram" has the ability to suppress sub-harmonics, and hence pitch detection based on it has much less half-Fo errors. To further pe-nalize the tendency of half-F0in Viterbi decoding, we use the dominant pitch tracks to remove their respective sub-harmonic contours. Comparison results show that our method produces much less half-F0/double-F0errors than the state-of-the-art technique.3. Since the classification-based CASA systems can not generalize well in new SNR (signal-to-noise ratio) conditions, we have proposed to binarize the outputs of each MLP (Multi-layer perceptron) by using a new adaptive threshold. This threshold is estimated from a histogram fitting module and can effectively compensate the mismatch of SNRs between the training and test phases. Experimental results on TIMIT utterances show that with the new threshold, the MLP-based CASA system has an improved SNR generalization ability.
Keywords/Search Tags:Blind source separation, Single-channel speech separation, Computa-tional auditory scene analysis (CASA), Binary mask, Feature extraction
PDF Full Text Request
Related items