Font Size: a A A

Research On Deep Learning Based Speech Dereverberation Method

Posted on:2022-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:X J WangFull Text:PDF
GTID:2518306542981159Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In real life,clean speech signals may be disturbed by reverberation,resulting in significant degradation of speech quality and intelligibility,especially for hearing-impaired people,whose auditory perception can be greatly affected.In recent years,with the wide application of deep neural networks(DNN)in the field of speech dereverberation,the objective metrics of enhanced speech have improved significantly,however,the subjective evaluation scores of listening tests have not gained the same degree of improvement,and the diversity of reverberation in real environments makes it difficult for traditional DNNs to target the enhancement of reverberant speech.This is mainly due to the fact that traditional DNNs do not consider the problem of different inter-frame interactions and correlation between consecutive frames under different degrees of reverberation when performing speech dereverberation,and use the same parameters for feature extraction.In addition previous studies on how to enhance reverberant speech in noisy environments have also been little studied.The main work of this paper to address these two issues includes.(1)In order to reduce the impact of reverberation on auditory perception using signal preprocessing methods,it is first necessary to understand the physical properties of reverberation and the principles of its generation.Therefore,this paper first investigates reverberant signals in closed space,summarizes the classical methods and the results of recent papers in the field of dereverberation,discusses some basic theories that need to be used in this study,discusses the supervised learning-based dereverberation framework and training The supervised learning-based dereverberation framework and training steps are discussed,as well as several widely used training objectives.(2)To address the problem that most current speech dereverberation methods do not take into account the mutual interference between different frames under different degrees of reverberation conditions,a feature extraction method based on reverberation time perception is proposed to optimize the dereverberation effect of the system under different reverberation environments by first classifying and training speech under different RT60 in the DNN training phase.In the dereverberation stage,the reverberation time is first estimated,and then suitable frame shift coefficients and speech context window coefficients are selected for feature extraction,and then the reverberated speech features are input to the trained DNN for dereverberation.A multi-target neural network is built on this basis,combining masking-based and amplitude spectrum estimation-based methods.The experimental results show that the proposed method in this paper has significantly improved speech intelligibility and speech quality compared with the traditional method that does not consider reverberation time,and shows strong robustness for reverberant speech in unknown environments.(3)For the speech enhancement problem when noise and reverberation exist at the same time,it is considered that reverberation and noise are two different types of interference and should be handled separately,thus a two-stage deep learning based speech enhancement strategy is proposed.Specifically,the denoising phase and the reverberation removal phase are performed sequentially using two deep learning models.In the denoising phase,a multiobjective neural network is introduced as the network model,and ideal ratio masking(IRM)is used as the training target combined with a method based on amplitude spectrum estimation,and in the de-reverberation phase,a BLSTM network is used combined with a method based on amplitude spectrum estimation to finally obtain the amplitude spectrum of the enhanced speech,and a new objective function is used to add the phase of pure speech to assist in the model training process.This will in turn improve the accuracy of phase estimation,followed by waveform reconstruction using an iterative phase reconstruction method,and finally the twostage model is jointly trained to optimize the proposed objective function.Experimental comparisons and speech spectrogram analysis show that the proposed algorithm substantially improves speech intelligibility and speech quality under the same conditions,and shows strong robustness to unknown noise and reverberation,compared to traditional single-stage speech enhancement methods.
Keywords/Search Tags:multi-target deep neural networks, reverberation time perception, speech enhancement, amplitude spectrum estimation, ideal ratio masking
PDF Full Text Request
Related items