Font Size: a A A

Single-Channel Speech Separation Using Sequential Dictionary Learning

Posted on:2016-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y F XuFull Text:PDF
GTID:2308330467494915Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech separation algorithm as an efficient method which can recover the underlying sources from the mixture has been getting increased attention. The numbers of intelligent terminal’s microphones are often less than the numbers of the speech sources and in extreme conditions there is only one microphone is available. Hence the single-channel speech separation (SCSS) technology is becoming more and more important.Recently, dictionary learning algorithm is more and more used to solve the SCSS problem. The existing dictionary learning algorithms assume that speeches from different speakers have their unique components which makes that the speeches can be sparsely represented by different dictionaries. As we all know speech signal is short-term stationary and should be decomposed into series of short segments, but at the same time the correlation between different speakers’speech signals has be increased, that is to say the speech segment of different speakers may have similar components. This paper propose a novel sequential discriminative dictionary learning (SDDL) algorithm which is used to separate the mixed signal in SCSS system and a new speech post-processing framework which is used to improve the quality of separated speeches. The main content and innovation points are as follows:1. We take the unique and similar components of different speakers’signals into account, and design a multi-layer sequential structured dictionary which contains discriminative and buffer sub-dictionaries in each layer, In the traning stage, a proper objective function is derived, which guarantee that the unique components of the training sets can be explained by their corresponding discriminative sub-dictionaries and the similar components of the training sets can be explained by the buffer sub-dictionary rather than the cross sub-dictionary. The components distributed in the buffer sub-dictionary are used as training sets in the next layer. And in the separation stage, the unique components of different speakers’signals which have better correspondences to the speakers’labels are firstly separated, and the similar components of different speakers’signals are separated in the next layer. Experiments results verify that the proposed sequential discriminative dictionary learning bansed SCSS algorithm can effectively reduce the source confusion compared to the existing algorithms. 2. Because the separated speeches are still mixed with other sources and there are a certain degree of distortion. So a new speech post-processing framework is proposed which contains an adaptive separation module which is used to suppress the influence of the mismatch of the training sets and the test sets, a time-frequency filter module which is used to further suppress the confusion of separated speeches, and a harmonic regeneration module which is used to suppress the distortion of the separated speeches. Experiments results show that the proposed post-processing framework can improve the quality of separated speeches.
Keywords/Search Tags:Single-channel speech separation, sequential discriminative dictionarylearning (SDDL), speech post-prosing, time-frequency masking, harmonicregeneration
PDF Full Text Request
Related items