Font Size: a A A

Research On Speech Separation Algorithm Based On Deep Learning

Posted on:2021-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:X R ZhaoFull Text:PDF
GTID:2428330632962896Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of artificial intelligence and mobile communication,speech separation technology,as a basic work in signal processing,has been paid more and more attention by researchers.Because in the sound environment of cocktail party,it seems that human auditory system can easily distinguish the target speaker's voice from the mixed voice including other speakers and background noise.However,it is very difficult for computers.With the development of deep learning,single channel speech separation technology based on deep learning emerges in endlessly,which greatly improves the quality of speech separation.This paper focuses on the deep clustering(DPCL)algorithm to study the single channel multi speaker speech separation.DPCL algorithm is the forerunner of single channel multi speaker speech separation algorithm based on deep learning.Since then,a lot of research has been carried out on the basis of DPCL algorithm,which is the cornerstone.This paper starts with the network structure and clustering algorithm of DPCL algorithm,and optimizes it.Firstly,this paper uses the gated recurrent unit(GRU)to construct the deep neural network applied to DPCL algorithm.GRU has been proved to be similar to or even better than long-short-time memory(LSTM)network in the field of polyphonic music modeling and speech signal modeling,and GRU can greatly save computing costs,but it is rarely used in speech separation scenarios.In this paper,we try a variety of bi-directional gated recurrent unit(BGRU)as the main network structure and come to a preliminary conclusion:the performance of GRU in DPCL algorithm is not as good as that of LSTM.Then,aiming at the problems of complex network structure,long training time and single clustering algorithm,this paper attempts to use BLSTM and BGRU to construct the network structure of DPCL algorithm,and combine several clustering algorithms to reduce the network complexity of DPCL algorithm and improve its speech separation performance.Finally,a DPCL optimization algorithm based on GMM and new network structure is proposed.Different from the algorithm which depends on increasing the complexity of DPCL algorithm to improve its speech separation performance,the network structure of DPCL optimization algorithm proposed in this paper is nearly one-third shorter than that of the original network structure.The new network structure is composed of BLSTM and BGRU.The optimized algorithm greatly shortens the training time of neural network,and improves the two-same-gender-speaker to 9.5dB,two-different-gender-speaker to 11.8dB,and the overall separation to 10.65dB in signal to distortion(SDR).
Keywords/Search Tags:deep learning, multi-speaker separation, GRU, GMM
PDF Full Text Request
Related items