Font Size: a A A

Underdetermined Blind Source Separation Based On Sparsity Of Speeches

Posted on:2015-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:X M KuiFull Text:PDF
GTID:2268330431450119Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In actual scenes we are inevitably influenced by other speakers or ambient noise when communicating with others or machines, which makes it difficult for each other to obtain information. To recover the signal from contaminated signals the technique of blind source separation has been brought by researchers. Here blind means both the mixing process and the original sources are unknown. As one kind of blind source separation, blind speech separation (BSS) plays an important role in real life including front-end processing of robust automatic speech recognition, scene analysis, video conference, hearing aid and surveillance devices.As respect to mixing process, BSS can be divided into two kinds, namely linear instantaneous model, which doesn’t consider the delay between the sources and the microphones, and convolutive model. When considering the difference between sources numbers and microphone numbers BSS falls into three kinds:overdetermined BSS when the number of microphones is greater than that of sources, determined BSS and underdetermined BSS. Overdetermined BSS has more information than underdetermined, so it can obtain more accurate sources. But due to limits of devices or environments the number of sources may exceed that of microphones. As a result, it is important to do research on underdetermined BSS. The main work of this thesis is focused on underdetermined BSS from two models:linear instantaneous model and convolutive model.For linear instantaneous BSS, we proposed a compressed sensing (CS) based separation method for linear speech mixtures. The idea is to combine dictionary self-learning with CS and the proposed method is implemented in two stages. Firstly we exploited a robust method to get an accurate estimate of the mixing matrix. The second step is the dictionary self-learning process which uses the separated signals to train the dictionaries and then the trained dictionaries are used to obtain the newly separated signals. The process alternates between the refinement of dictionaries and estimated signals until it converges. By adaptively regenerating the dictionaries, the final dictionaries approximate the optimal sparse basis of the original sources while the separation performances improve meantime. The proposed method doesn’t need any information of the sources thus it is an unsupervised method, which can be widely used.For underdetermined convolutive model, we designed a separation strategy combining frequency bin-wise processing with post-processing based on single channel dereverberation. It mainly contains three stages:In the first step the mixing matrix in each frequency bin is estimated using clustering method based on the assumption that only one signal is active at each time-frequency point and then the mixing matrix is permuted. Secondly the sources are reconstructed under a MAP framework assuming the sources follow the generalized Laplacian distribution. Finally to further improve the quality of estimated sources, we added a post-processing procedure to eliminate the reverberant components and interferences from other sources, which simultaneously improves the separation performances.
Keywords/Search Tags:Underdetermined linear instantaneous BSS, Underdetermined convolutiveBSS, Sparse basis, Dictionary self-learning
PDF Full Text Request
Related items