Font Size: a A A

Underdetermined Speech Separation Based On Sparse Representation And Deep Learning

Posted on:2018-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2348330536962021Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Audio signals are often disturbed by the environment noises or other unconcerned sources,and the mixture of mutil-sources makes the high-level audio application be difficult,such as speech recongnition.Recovering the sources from the mixture is an important problem to be solved in the field of audio processing.Human can separate sources from mixture easilly,but it is very hard for computer system especially for underdetermined case which means mixture channel number is less than sources.This paper focuses on solving underdetermined audio sources separation problem,it contains the following aspects:(1)For the underdetermined convolutive separation problem,this paper analysises various sparsity inducing functions and proposes an separation algorithm based on lq(0<q<1)norm.This algorithm exploits the strong sparsity inducing ablility to constraint the sparsity of signals in the time-frequency domain.Besides,the low rank prior is adopted for better recovery accuracy.This paper derives an optimazation algorithm based on proximity operator.Experiments on the BSS Oracal copus demostrate that the propoed algorithm can impove the separation quality effectively.(2)For the monaural audio sources separation problem,an separation algorithm based on the time domain convolutional neural networks(Time-CNN),the input and output of which are both in time domain,is proposed in this paper.There are two key ideas behind the time-domain convolutional network: one is learning features automatically by the convolutional layers instead of extracting features such as spectra;the other is that the phase can be recovered automatically since both the input and output are in the time domain.In order to improve the recovery accuracy,a mixing loss function is proposed.Besides time-frequency mask is applied after output for a better hearing feeling.Vast experiments are taken on TSP corpus and the result showes the proposed algorithm can improve monaural audio source separation performance significantly.
Keywords/Search Tags:Underdetermined Source Separation, Monaural Source Separation, Convolutional Neural Networks, Deep Learning
PDF Full Text Request
Related items