Font Size: a A A

Single Channel Speech Separation Based On Nonnegative Matrix Factorization

Posted on:2017-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y D MaiFull Text:PDF
GTID:2348330536467489Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speech separation aims at extracting a single speaker's speeches from mixtures.In generally,the mixture is divided into two types: single-channel and multi-channel.The former refers to mixture which is collected by single recording equipment and the latter is collected by several-recording equipments.The process of separating monaural mixture is called single-channel speech separation and the process of separating multi-channel mixture is called multi-channel speech separation.Monaural speeches contain little prior knowledge of speaker,therefore,single-channel speech separation task is more difficult than multi-channel speech separation task.Various existing speech separators provide services for human,and more and more people begin to recognize and favor them.Wherein,the speech separation technology can remove the background noise of mixed speech and improve the quality of services.Single-channel speech separation plays an important role in the service of speech.In single-channel speech separation,non-negative matrix factorization(NMF)approximately decomposes the magnitude or power spectrum of speeches signals as a product of two non-negative matrices which are basis matrix and coefficient matrix.The basis matrix reflects the spectral characteristics of the speakers.NMF learns features by using the training data of single speaker,therefore,it can tell apart which elements in the mixed signals are belongs to a speaker,and NMF has been successfully applied to single-channel speech separation task.NMF shows enormous potential in single-channel speech separation,this paper study monaural speech separation problem based on NMF.The main work is as follows:1)Although both convolutive NMF(CNMF)and transductive NMF(TNMF)are able to extract a single speaker's speech signal more effectively.However,TNMF can't capture the continuity of speeches signals and CNMF ignores the positive effect of mixed signals on learning basis matrices.In order to overcome the deficiencies of both TNMF and CNMF,this paper proposes transductive convolutive NMF(TCNMF).TCNMF learns several basis matrices for speakers by using both training and mixed speeches.Therefore,TCNMF has the ability of discovering the potential dependencies of speeches.Besides,the basis matrices learned by TCNMF contain much more meaningful information of speakers.The experimental results of single channel speech separation on Grid corpus show that TCNMF makes significant improvement compared to NMF?TNMF and CNMF.2)Compared with NMF,robust NMF(RNMF)can deal with noisy data more effectively.However,RNMF cannot guarantee that the learned coefficient matrix is sparse.In order to enhance the sparsity of coefficient matrix,this paper proposes robust non-negative local coordinate factorization(RNLCF).The local coordinate term of RNLCF enforces the basis vectors as close to the original data points as possible,and then it learns a sparse coefficient matrix.RNLCF can effectively avoid the negative effect of noisy data and learn a basis matrix for each source that is containing abundant information.The experimental results on 2nd CHiME corpus show that the single-channel speech separation performance of RNLCF is better than NMF?RNMF and NLCF.
Keywords/Search Tags:single–channel speech separation, non-negative matrix factorization, local coordinate code
PDF Full Text Request
Related items