Font Size: a A A

Research On Deep Neural Network Based Speech Dereverberation

Posted on:2019-12-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:B WuFull Text:PDF
GTID:1368330542973067Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In real-world environments,reverberation often seriously degrades speech quality and intelligibility.Such deteriorations can cause decreased performances for automatic speech recognition,hearing aids and source localization.Although substantial progresses have been made in the field of reverberant speech signal processing,it is still far away from claiming victory in the fight against reverberation.In the past several decades,many unsupervised dereverberation methods have been proposed,some of them are done with estimating an inverse filter of the room impulse response(RIR)to deconvolve the reverberant signal.But the tracking for an acoustic channel is a difficult problem,because 1)the RIR is unknown;2)the RIR can be varying in time and hard to estimate;3)the RIR is usually very long;4)the RIR is a nonminimum-phase system.Recently,due to their strong regression capabilities,deep neural networks(DNNs)have been widely used in speech enhancement(SE),source separation and bandwidth expansion.It gives much inspiration for speech dereverberation task using deep model.The deep struc-ture of DNN can be designed to be a de-reverberation filter.In addition,DNN can well learn the relationship between the reverberant speech and the anechoic speech based on big data.In this dissertation,unlike traditional signal processing,we address the speech dereverbera-tion and processing problems using DNNs.Firstly,we propose a single-channel DNN-based speech dereverberation system.The log-power spectra(LPS)is used as the feature.We adopt a linear activation function at the output layer and globally normalize the target features into zero mean and unit variance to learn the complicated mapping from reverberant to anechoic speech with a regression model based on DNNs.A large multi-condition training set,encompassing different key factors in reverberant speech(e.g.,speakers,RIR)is established to learn a high-quality DNN derever-beration model.A small training set is built up to evaluate the proposed DNN dereverbera-tion performances in situations that no sufficient samples are available.Secondly,many dereverberation approaches are environmentally insensitive.We inves-tigate two key design parameters,namely frame shift size in speech framing and acoustic context window size at the DNN input,to show that reverberation-time-dependent(RT60-dependent)parameters are needed in the DNN training stage in order to optimize the sys-tem performance in diverse reverberant environments.A reverberation-time-aware DNN(RTA-DNN)is proposed by incorporating RT60-dependent frame shift and acoustic context parameters.Thirdly,we extend the single-channel DNN-based dereverberation system to a multi-channel case.We propose a single DNN,namely DNNSpatial,to simultaneously perform beamforming and dereverberation by selectively concatenating input features of reverberant speech obtained from multiple microphones in an array and map them into the expected out-put features of anechoic reference speech.A reverberation-time-aware DNNSpatial(RTA-DNNSpatial),is then designed to improve the system performance and enhance the system environmental robustness,by adopting RT60-dependent temporal-spatial information.Finally,recent researches show that a large gain in speech quality can translate to a negligible improvement in transcription accuracy,and vice-versa.But we believe that "only good signal processing can lead to top ASR performance" in challenging acoustic environ-ments.We propose an integrated end-to-end automatic speech recognition(ASR)paradigm by joint learning of the front-end speech signal processing and back-end acoustic modeling,which leads to unified DNN framework for distant speech processing that can achieve both high-quality enhanced speech and high-accuracy ASR simultaneously.In the end of this paper,we gave the summarization and made some plans for the future.
Keywords/Search Tags:speech dereverberation, deep neural networks(DNNs), reverberation-time-aware(RTA), microphone array, ASR
PDF Full Text Request
Related items