Font Size: a A A

Segregation Of Reverberant Speech Based On Computational Auditory Scene Analysis And Deep Neural Network

Posted on:2017-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:M CaoFull Text:PDF
GTID:2308330503456989Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In a natural auditory environment, speech signals are degraded by both concurrent noise sources and reverberation. While human listeners are very good at focusing their attention on the speech of a particular signal even in such adverse circumstance, simulating this perceptual ability is a hot topic in the field of speech signal processing. A solution to the problem of speech separation in real environments is essential for many applications, such as, automatic speech recognition, audio information retrieval and hearing prosthesis.Reflections and diffractions of sounds from the walls and obstacles in an acoustic enclosure space are called reverberation. A distant microphone collects, in addition to direct sound, the early and late reflections arriving after the direct sound. Reverberation corresponds to a convolution of the direct sound and the room impulse response(RIR), which distorts the spectrum of speech in both time and frequency domains. Inspired by human auditory scene analysis, computational auditory scene analysis(CASA) approaches the segregation problem on the basis of perceptual principles.This dissertation researches segregation of speech in reverberation environment. As the harmonic structure of reverberant speech is corrupted while the traditional speech segregation algorithms are unsatisfactory, this dissertation proposes a couple of different reverberant speech segregation models.(1)Computational auditory scene analysis simulates the capability of auditory perception and utilizes the principle of ideal binary mask to extract the target speech signal. However, the accuracy rate of pitch estimation reduces and the performance of system degrades in reverberant environment. The proposed algorithm generates pitch contours for reverberant speech by applying Hidden Markov model(HMM) tracking and utilizes a likelihood ratio test to select the correct model for labeling T-F unit to improve the labeling accuracy.(2)Deep neural networks(DNNs) have exhibited strong learning capacity in speech recognition and artificial intelligence. This dissertation proposed a trained DNN to learn a spectral mapping between corrupted speech and clean speech to perform dereverberation and denoising. This study extracts the feature which is a succession of spectral features and integrates the rich information of temporal dynamics, DNN promises to be able to encode the spectral transformation from corrupted speech to clean speech and restore the magnitude spectrogram of clean speech. Finally, we use the inverse FFT process resynthesize time-domain signals. In addition, this dissertation presents a system of binaural reverberant speech segregation based on DNN classification. Binaural and monaural features are concatenated together to form a long feature vector for classification task where DNN is pre-trained with restricted Boltzmann machines(RBMs).Experimental results show that the proposed models lead to significant improvements of segregated speech intelligibility and quality, as well as the system stability in reverberant conditions.
Keywords/Search Tags:reverberation, computational auditory scene analysis, speech segregation, Hidden Markov model, deep neural networks
PDF Full Text Request
Related items