Font Size: a A A

Binaural Speech Separation Research Based On Deep Learning

Posted on:2020-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhuangFull Text:PDF
GTID:2428330620456152Subject:Information and communication engineering
Abstract/Summary:PDF Full Text Request
As the front-end of speech signal processing system,speech separation technology has an important impact on the performance of subsequent speech signal processing.At present,the performance of commonly used speech separation algorithms is limited in low SNR and reverberation environments.Based on computational auditory analysis,two binaural speech separation algorithms are proposed: DNN binaural speech separation algorithm based on improved ideal ratio mask and LSTM(Long Short-Term Memory)binaural speech separation algorithm.(1)DNN binaural speech separation algorithm based on improved ideal ratio mask.In this paper,Gammatone human ear auditory filter is used to get the time-frequency unit after the original speech preprocessing.The time-frequency unit is used to extract the binaural spatial characteristic parameters: Cross Correlation Function(CCF),Interaural Time Difference(ITD),Internal Level Difference(ILD)as the input of Deep Neural Network(DNN).Traditional speech separation algorithms usually use Ideal Binary Mask(IBM)for speech separation.This paper improves the Ideal Ratio Mask(IRM)which was originally used in speech enhancement and now applied to multi-speaker separation.The azimuth is modeled by azimuth angle,19 azimuth angles are set,and the ambient noise is taken as the 20 th azimuth angle.The improved IRM value of each source and noise in the element is used as the training target of the corresponding azimuth.In this paper,Sources to Artifacts Ratio(SAR),Source to Distribution Ratio(SDR),Source to Interferences Ratio(SIR),Perceptual Evaluation of Speech Quality(PESQ)are used as evaluation indexes.The simulation results show that this algorithm is superior to the traditional Degenerate Unmixing Estimation Technology(DUET)algorithm and the IBM-based DNN binaural speech separation algorithm.The algorithm in this paper is in low signal noise.The separation index parameters in the reverberation environment are significantly improved than those in the reverberation environment.(2)Biaural speech separation algorithm based on LSTM.Compared with DNN,the Recurrent Neural Network is more suitable for the modeling of speech signal characteristic parameters because of the time sequence of speech signal characteristic parameters.In this paper,Bi-directional Long Short-Term Memory(BiLSTM)is used to extract the Cross Correlation Function,Interaural Time Difference,Internal Level Difference between the current frame and the pre-and post-frame time-frequency units,which is used as input feature to construct two-layer LSTM units.Finally,the software Max layer is connected to represent the probability of sound source in 20 azimuths.The final time value is output as the estimated floating-value mask Estimated Ratio Mask(ERM)of the current time-frequency unit.The loss training network is calculated by means of the mean square error loss function.In the test stage,multi-frame signals are sent to BiLSTM to obtain ERM and speech separation is carried out.The experimental results show that LSTM-based binaural speech separation can effectively utilize the feature information of the front and back frames.Compared with DNN-based network,it can significantly improve the subjective evaluation index.The speech quality is full and the separation effect is better.
Keywords/Search Tags:Deep Neural network, Speech Separation, Computational Auditory Scene Analysis
PDF Full Text Request
Related items