Font Size: a A A

Study On Speech Separation Based On Non-negative Matrix Factorization And Deep Clustering

Posted on:2021-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:W Y GeFull Text:PDF
GTID:2428330614458191Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Separating target speeches from the observed mixture in various complex acoustics environments,known as speech separation,plays an important part in modern daily communications.The supervised methods,including non-negative matrix factorization(NMF)and deep neural networks(DNN),are commonly used in morden speech separation methods.They use pre-trained model to learn the complex relationship between mixtures and target speeches,and thus achieve better separation performance compared with unsupervised ones.Although the supervised speech separation methods' performance has been improved significantly,there still are some drawbacks because their learned models do not meet the complex relationship with the various kinds of non-stationary noises in our daily life.To address this problem,this thesis studies two improved speech separation methods and describes them as follows:(1)To constrain the over-estimated background noise,the noisy speech is first decomposed into time-frequency(TF)domain by computational auditory scene analysis(CASA).NMF is used to estimate noise and calculate the estimated ratio mask.Then,the correlation between noisy and estimated noise signal is calculated and used as the presence probability of the noise presence.Finally,the estimated ratio mask is optimized to reduce the over-estimated noise components in the estimated noise by convex optimization.Experimental results show that the enhanced speeches processed by the proposed algorithm obtain higher speech quality and intelligibility.(2)To address the performance degeneration caused by the additional background noise in the multi-talker speech mixture,NMF and deep clustering(DC)are used to separate monaural noisy speech mixture.NMF is used to decompose the noisy speech mixture spectrograms into speech and noise activation coefficients.Then,the speech coefficients are project into a high-dimensional embedding space by DC.The K-means method is used to cluster the embeddings and calculate the separation mask.Finally,NMF uses the separated coefficients to reconstruct the spectrograms of source speeches and noise to obtain the estimated speeches from different speakers.Experimental results in various background noise environments show that the proposed algorithm effectively suppress the loss of target speech and the disturbance of non-target signal,and obtains higher overall speech quality.
Keywords/Search Tags:speech separation, non-stationary noise, non-negative matrix factorization, deep clustering
PDF Full Text Request
Related items