Font Size: a A A

Binaural Speech Separation Research Based On Deep Learning Of Time Series

Posted on:2022-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LuFull Text:PDF
GTID:2518306740497024Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
As the front-end of speech signal processing system,the effect of speech separation is directly related to the performance of subsequent speech signal processing.Recent researches show that the performance of traditional speech separation algorithms is obviously degraded in low SNR and high reverberation environment.Based on the characteristics of human auditory perception and the spatial characteristics of speech signal,this paper studies the binaural speech separation algorithm based on deep learning algorithm: the binaural speech separation algorithm based on Gated Recurrent Unit(GRU)and Spectral Magnitude Mask(SMM),and the Temporal Convolutional Network(TCN)and Spectral Magnitude Mask.(1)Binaural speech separation algorithm based on Gated Recurrent Network and Spectral Magnitude Mask.In this paper,gamma filter bank is used to simulate the processing process of human cochlear speech signal,and the original speech signal is divided into time-frequency unit by preprocessing.The binaural spatial feature parameters are extracted from each time-frequency unit,and the combination of Cross Correlation Function(CCF),Interaural Time Difference(ITD)and Interaural Level Difference(ILD)is used as the spatial features.DNN(Deep Neural Network)is generally used as a classifier in supervised learning based speech separation algorithm.DNN method only uses the features of the current frame for separation,and ignores the timing of speech signal feature parameters.In order to solve this problem,this paper selects the Gated Recurrent Unit to build Recurrent Neural Network(RNN),and stitches the spatial feature information of the current frame and 5 frames before and after each frame as the input of the neural network.The Spectral Magnitude Mask is selected as the continuous value of the training target,which can better describe the proportion of each target speech in the time-frequency unit.In this paper,SAR(Sources to Artifacts Ratio),sir(Source to Interferences Ratio),SDR(Source to Distortion Ratio)and PESQ(Perceptual Evaluation of Speech Quality)are used to evaluate the separation results.The simulation results show that the proposed algorithm is more robust than the separation algorithm based on DNN in low SNR and reverberation environment.(2)Binaural speech separation algorithm based on Temporal Convolutional Network.RNN(Recurrent Neural Networks)training is more prone to gradient disappearance and over fitting,and the input of RNN is multi-frame features in timestep order,which leads to slow training speed.TCN is based on one-dimensional convolution,jumping out of RNN framework to process sequence information,and using dilated convolution can correlate the information of a long time ago,which achieves better results than RNN in many scenarios,and based on multi frame feature parallel computing,the training speed is faster,The interaural cross-correlation function,interaural time difference and interaural level difference of the previous five frames and the current frame are extracted as the input features.The built Temporal Convolutional Network has a total of four layers,and each layer is a residual structure composed of two one-dimensional convolution layers,which can fully retain the information of the previous and current frames.The convolution expansion coefficient increases with the increase of the number of layers,and then the multi-layer feedforward network is connected to the output,the training target is SMM.The simulation results show that TCN makes better use of the sequence characteristics of speech signals,and improves the evaluation indexes compared with the algorithm based on Gated Recurrent Network.Moreover,the TCN training speed is faster and there is less over fitting phenomenon.
Keywords/Search Tags:Binaural Speech Separation, Gated Recurrent Unit, Temporal Convolutional Network, Spectral Magnitude Mask
PDF Full Text Request
Related items