Font Size: a A A

Time Domain Speech Separation Algorithm Based On Deep Neural Network

Posted on:2022-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:H DingFull Text:PDF
GTID:2518306539980679Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology in the world,the demand for human-computer interaction is gradually increasing.Voice communication is the most convenient and quick way,but in real life,pure speech is often difficult to obtain.At this point,Single Channel Speech Separation(SCSS)technology becomes particularly important.However,the traditional shallow layer model can not solve the problem of speech separation well.At present,with the large-scale application of Deep Neural Network(DNN),researchers use the multi-layer nonlinear processing structure of Deep Neural Network to excavate the information in the speech data,and have achieved good results in the field of speech signal processing.In this paper,the deep neural network technology is applied to the single-channel speaker speech separation problem,mainly to solve the single-channel speaker speech separation problem in time domain.The main research contents of this paper are as follows:1.Aiming at the problem of single channel speaker speech separation,a DNN time-domain single channel speaker speech separation algorithm based on Gammatone filter bank is presented.Based on the overall architecture of CONVTASNET,this algorithm uses a multi-phase Gammatone filter bank to replace the encoder of one-dimensional convolution.It directly processes the input mixed speech signals in time domain,and then inputs the obtained time domain features into the Time Convolution Deep Neural Network(TCN).Train the separation network to get the mask needed for speech separation.Finally,the mask and the output processed by the filter bank are used to multiply the corresponding elements of the matrix,and then the decoder is used to process the separated pure speech signal.The TCN separation network is trained by means of Mean Square Error(MSE),a loss function commonly used in frequency domain speech separation.In this way,the loss of Mean Square Error(MSE)can be used in time domain speech separation.2.A target-oriented DNN time-domain single-channel speech separation algorithm based on a non-linear Gated unit TCN network was proposed.Timedomain speech separation,the basic framework of the algorithm,to better handle the characteristics of speech signal,because the speaker voice pronunciation by different characteristics(such as speed)characteristics of the change of the time scale,in different time to receive the voice signal may be helpful in speech separation effect,and received in the previous network structure is fixed on the scale of speech signal.In this algorithm,a nonlinear gating unit is added into the TCN separation network,and the corresponding optimization of the TCN network is made,so that the network has multi-scale invariance.The optimized TCN network is called the Gated TCN network.The network replaces the original TCN separation network to generate separation masks.And the multiphase Gammatone filter bank proposed in Chapter 3directly processes the original mixed speech signals and trains the separation network of the Gated TCN with different loss functions,namely mean square error(MSE)and scale invariant signal-to-noise ratio(SI-SNR).Finally,the decoder is used to restore the original pure speech signal.
Keywords/Search Tags:Gammatone filter bank, Time domain speech separation, Deep neural network, Mean square error(MSE), Gated TCN neural network
PDF Full Text Request
Related items