Study On Speech Separation Based On Non-negative Matrix Factorization And Deep Clustering

Posted on:2021-05-11

Degree:Master

Type:Thesis

Country:China

Candidate:W Y Ge

Full Text:PDF

GTID:2428330614458191

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Separating target speeches from the observed mixture in various complex acoustics environments,known as speech separation,plays an important part in modern daily communications.The supervised methods,including non-negative matrix factorization(NMF)and deep neural networks(DNN),are commonly used in morden speech separation methods.They use pre-trained model to learn the complex relationship between mixtures and target speeches,and thus achieve better separation performance compared with unsupervised ones.Although the supervised speech separation methods' performance has been improved significantly,there still are some drawbacks because their learned models do not meet the complex relationship with the various kinds of non-stationary noises in our daily life.To address this problem,this thesis studies two improved speech separation methods and describes them as follows:(1)To constrain the over-estimated background noise,the noisy speech is first decomposed into time-frequency(TF)domain by computational auditory scene analysis(CASA).NMF is used to estimate noise and calculate the estimated ratio mask.Then,the correlation between noisy and estimated noise signal is calculated and used as the presence probability of the noise presence.Finally,the estimated ratio mask is optimized to reduce the over-estimated noise components in the estimated noise by convex optimization.Experimental results show that the enhanced speeches processed by the proposed algorithm obtain higher speech quality and intelligibility.(2)To address the performance degeneration caused by the additional background noise in the multi-talker speech mixture,NMF and deep clustering(DC)are used to separate monaural noisy speech mixture.NMF is used to decompose the noisy speech mixture spectrograms into speech and noise activation coefficients.Then,the speech coefficients are project into a high-dimensional embedding space by DC.The K-means method is used to cluster the embeddings and calculate the separation mask.Finally,NMF uses the separated coefficients to reconstruct the spectrograms of source speeches and noise to obtain the estimated speeches from different speakers.Experimental results in various background noise environments show that the proposed algorithm effectively suppress the loss of target speech and the disturbance of non-target signal,and obtains higher overall speech quality.

Keywords/Search Tags:

speech separation, non-stationary noise, non-negative matrix factorization, deep clustering

PDF Full Text Request

Related items

1	Research On Parallel Algorithm Of Deep Transductive Non-negative Matrix Factorization For Speech Separation
2	Research On Two Methods Of Single Channel Speech Separation
3	Study On Deep Non-negative Matrix Factorization Algorithm
4	A Research Of Voice And Complicated Background Noise Based On CNMF
5	Single Channel Speech Separation Based On Nonnegative Matrix Factorization
6	The Research Of Key Techniques Of Speech Separation And Speech Recognition
7	Study Of The Deep Matrix Factorization And Its Application In Image Clustering
8	Research On Underdetermined Convolutive Speech Signal Separation Methods
9	Research On The Method Of Underdetermined Blind Speech Separation Based On NMF And Sparseness
10	Underdetermined Source Separation And Its Application To Speech Processing