Binaural Speech Separation Research Based On Deep Learning

Posted on:2020-08-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhuang

Full Text:PDF

GTID:2428330620456152

Subject:Information and communication engineering

Abstract/Summary:

PDF Full Text Request

As the front-end of speech signal processing system,speech separation technology has an important impact on the performance of subsequent speech signal processing.At present,the performance of commonly used speech separation algorithms is limited in low SNR and reverberation environments.Based on computational auditory analysis,two binaural speech separation algorithms are proposed: DNN binaural speech separation algorithm based on improved ideal ratio mask and LSTM(Long Short-Term Memory)binaural speech separation algorithm.(1)DNN binaural speech separation algorithm based on improved ideal ratio mask.In this paper,Gammatone human ear auditory filter is used to get the time-frequency unit after the original speech preprocessing.The time-frequency unit is used to extract the binaural spatial characteristic parameters: Cross Correlation Function(CCF),Interaural Time Difference(ITD),Internal Level Difference(ILD)as the input of Deep Neural Network(DNN).Traditional speech separation algorithms usually use Ideal Binary Mask(IBM)for speech separation.This paper improves the Ideal Ratio Mask(IRM)which was originally used in speech enhancement and now applied to multi-speaker separation.The azimuth is modeled by azimuth angle,19 azimuth angles are set,and the ambient noise is taken as the 20 th azimuth angle.The improved IRM value of each source and noise in the element is used as the training target of the corresponding azimuth.In this paper,Sources to Artifacts Ratio(SAR),Source to Distribution Ratio(SDR),Source to Interferences Ratio(SIR),Perceptual Evaluation of Speech Quality(PESQ)are used as evaluation indexes.The simulation results show that this algorithm is superior to the traditional Degenerate Unmixing Estimation Technology(DUET)algorithm and the IBM-based DNN binaural speech separation algorithm.The algorithm in this paper is in low signal noise.The separation index parameters in the reverberation environment are significantly improved than those in the reverberation environment.(2)Biaural speech separation algorithm based on LSTM.Compared with DNN,the Recurrent Neural Network is more suitable for the modeling of speech signal characteristic parameters because of the time sequence of speech signal characteristic parameters.In this paper,Bi-directional Long Short-Term Memory(BiLSTM)is used to extract the Cross Correlation Function,Interaural Time Difference,Internal Level Difference between the current frame and the pre-and post-frame time-frequency units,which is used as input feature to construct two-layer LSTM units.Finally,the software Max layer is connected to represent the probability of sound source in 20 azimuths.The final time value is output as the estimated floating-value mask Estimated Ratio Mask(ERM)of the current time-frequency unit.The loss training network is calculated by means of the mean square error loss function.In the test stage,multi-frame signals are sent to BiLSTM to obtain ERM and speech separation is carried out.The experimental results show that LSTM-based binaural speech separation can effectively utilize the feature information of the front and back frames.Compared with DNN-based network,it can significantly improve the subjective evaluation index.The speech quality is full and the separation effect is better.

Keywords/Search Tags:

Deep Neural network, Speech Separation, Computational Auditory Scene Analysis

PDF Full Text Request

Related items

1	Binaural Speech Separation Research Based On Deep Learning
2	Speech Separation Based On Microphone Array And Deep Learning
3	Monophonic Speech Separation Based On Computational Auditory Scene Analysis
4	The Blind Separation Of Monaural Speech Based On Computational Auditory Scene Analysis
5	Binaural Speech Separation Research Based On Deep Learning
6	Segregation Of Reverberant Speech Based On Computational Auditory Scene Analysis And Deep Neural Network
7	The Research Of Speech Separation Based On Computational Auditory Scene Analysis
8	Method And Implementation Of Monophonic Double Speech Separation Based On Auditory Scene Analysis
9	Speech Separation Based On Computational Auditory Scene Analysis
10	Research And Verification Of Monaural Speech Segregation Based On Computational Auditory Scene Analysis And Deep Neural Network