Font Size: a A A

Computational Auditory Model And Deep Neural Network Based Binaural Speech Segregation

Posted on:2018-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:N N FanFull Text:PDF
GTID:2348330512985639Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech is one of the most important ways for people to communicate with each other in daily life.The acoustic signals coming from the real world are often cor-rupted by environmental noises,reflections,interferences of other speakers,leading to the degradation of the speech quality and intelligibility.Techniques to segregate clean speech from mixed noisy signals are essential in many speech-enabled applications.The conventional speech segregation algorithms have been developed for several decades such as spectral subtraction and Wiener filter method.However,various as-sumptions such as the stationarity of noises and the space independence between target speech and interference are made in most of those methods,which can not be satis-fied in realistic acoustic conditions.Thus the separated speeches are often affected by musical noise artifacts.Inspired by the human auditory processing system,more and more researchers focus on the auditory scene analysis(ASA)whose purpose is to find scene-related speech features such as cochlear features and pitch.And the studies in computational theory of ASA(CASA)have developed rapidly for its valuable uses on computer simulating human ear and brain functions.However,the segregated speech based on CASA and classification-based DNN method suffers from the discontinuity in time and frequency bands,and causes the poor speech quality.To solve these problems,this thesis focus on:Design a DNN-based regression approach to improve the continuity of seperated speech;Design effective binaural fea-tures to extract the scene-related information coming from binaural speeches.Design a spectral-temporal receptive field feature based speech seperation method.First,we proposed a DNN-based regression approach to binaural speech segrega-tion.Here we consider a regression based method where the clean speech features are directly predicted from the noisy input using the non-linear mapping function by DNN.Since the regression approach is based on the feature generation and can fully utilize the neighbouring frames and full frequency band information,the speech continuity can be better preserved.Our experiments show that our proposed approach can signif-icantly outperform IBM-based speech segregation in terms of both objective measures of speech quality and speech intelligibility for noisy and reverberant environments.In addition,we designed a binaural feature called Interaural Level Difference(ILD)based on log-power spectral feature.We explored the monaural and binaural features based on the computational auditory theory,and then designed an Full-band ILD and Global ILD.The later is an avarage over frequency bands of the former,and the rep-resentations of the two ILDs indicate the effectiveness of them.We also designed a Sub-band ILD as a trade-off between high dimension problem and information insuffi-cient problem.Based on the nonlinear frequency selective characteristics of human ear cochlea structure,we maped the LPS frequency bands to gammatone frequency bands.The experiment results show that our design of ILD utilize the binaural information ef-ficiently,and the binaural sub-band ILD features are extremely effective in segregation.Finally,we did a extended study on human auditory processing system related features,and designed a spectral-temporal receptive field(STRF)feature based binau-ral speech seperation method.In the former researches in human auditory system,the spectral-temporal receptive field(STRF)was proposed as a concept to represent the mathmatic modeling of the stimulation response in human auditory cortex,which will process the input spectrogram to a series of time-frequency dynamic features.We de-signed one set of STRF features including different scales and rates.To solve the high dimension problem of this feature,we applied several dimension reduction methods such as frequency average method and PCA.We also designed a STRF based ILD fea-ture called SILD for binaural clue extraction.And then we did the dimension reduction based on learnable weight sum method.The experiments results show that the STRF and SILD based speech seperation methods outperform our baseline system in terms of speech quality.We also did an exploration and analysis on the factors that will affect the effctiveness of STRF filters.
Keywords/Search Tags:Binaural Speech Segregation, Regression Based Deep Neural Network, Monaural Features, Binaural Features, Spectral-Temporal Receptive Field Features
PDF Full Text Request
Related items