Font Size: a A A

Study On The Underdetermined Speech Separation Based On Deep Neural Network

Posted on:2018-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z G MiaoFull Text:PDF
GTID:2348330536961160Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speech is one of the commonly used media for communication.In the actual environment,it will inevitably be affected by noise and speech separation came into being.Speech separation is the process of separating the source signal from the mixed signal,which plays an important role in speech processing systems such as speech recognition,speaker recognition and audio retrieval.The paper is devoted to the research of underdetermined speech separation,especially single channel speech separation.The research work mainly consists of the following aspects:(1)In the paper,a single channel speech separation method combining phase sensitive time-frequency mask and deep neural network is proposed.In the method,the input of the network is the amplitude of the mixed signal and the target output is the proposed phase sensitive time-frequency mask,which introduces the phase information of the speech signal.During the training process,the objective function selects the signal approximation.In the test phase,the source signal is reconstructed with the phase of the mixed signal.Comparing with other methods,the proposed method with phase information has achieved better results.(2)Considering that the complex time-frequency mask can simultaneously restore the amplitude spectrum and phase spectrum of the speech,this paper combines the complex time-frequency mask with deep neural network to solve the single channel speech separation.The mixed signal amplitude spectrum is still used as the input of the network,and the complex time-frequency mask is used as the target output of the network.As the complex time-frequency mask calculation is difficult,the directly using of deep neural network prediction is not accurate.Therefore,the objective function of phase constraint is proposed in this paper,which is helpful to the estimated phase.During the test phase,the estimated phase is used to reconstruct clean source signal.This method can accurately estimate the phase of the source signal,which has better separation results.A series of experiments on TSP speech database are carryied out in this paper.And the results show that the proposed method has better speech separation quality compared with other existing methods.
Keywords/Search Tags:Speech Separation, Deep Neural Network, Complex Time-Frequencty Mask, Objective Function
PDF Full Text Request
Related items