Font Size: a A A

Robust Speech Separation Based On Binaural Localization

Posted on:2017-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:J M ShuFull Text:PDF
GTID:2348330491963035Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speech separation is an important fore-end technology of speech signal processing system and its performance has a great effect on the whole system's capability. Most current researchs focus on speech separation of fixed target position and do not perform well in noisy and reverberant environments. This thesis begins with properties of human auditory system, focusing on robust speech separation of free target position based on binaural features. Work in this thesis mainly contains two parts:binaural localization based on deep neural network and iteration of localization and separation.(l)Binaural localization based on deep neural network. Human's auditory system receives signal and then process it layer by layer. This behavior is similar to the way of calculation in a deep neural network which was very popular in machine learning in recent years. This thesis treat binaural localization problem as multiclass classification task. We train a deep neural network whose top layer is softmax regression to predict probabilities of each direction and then the direction with maximal probability is considered as sources"s location. For localization, binaural features of cross correlation function and interaural intensity difference are used. We choose localization accuracy as evaluation criterion. Under high SNR and short reverberation time, our localization algorithm is almost 100% correct; under low SNR and long reverberation time, localization accuracy is above 70%.(2) Iteration of localization and separation. This thesis separates speech by calculating idal binary mask based on interaural time difference and interaural intensity difference. To enhance robustness of algorithm, we put foward a method of iterating binaural localization and speech separation. Simply speek, we firstly estimate locations of multiple sources from mixed speech, then separate the mixed speech according to the sources'locations; secondly we re-estimate the locations from the separated speeches and then re-separate the mixed speech according to the modified locations. After several iterations, we choose speeches separated at last time as final output. To evaluate the algorithm, we calculate perceptual evaluation of speech qulity (PESQ) of separated speeches. Under high SNR and short reverberation time, PESQ socre is about 2.5; under low SNR and long reverberation time, PESQ socre is about 1.6.Artificial binaural speeches used in this thesis are composed by monoral speech and HRTF database of MIT. Real binaural speeches are recored by KERMAR artificial head in our laboratory room.
Keywords/Search Tags:Binaural Localization, Deep Neural Network, Speech Separation, Iteration
PDF Full Text Request
Related items