Research On Deep Learning Based Speech Dereverberation Method

Posted on:2022-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:X J Wang

Full Text:PDF

GTID:2518306542981159

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In real life,clean speech signals may be disturbed by reverberation,resulting in significant degradation of speech quality and intelligibility,especially for hearing-impaired people,whose auditory perception can be greatly affected.In recent years,with the wide application of deep neural networks(DNN)in the field of speech dereverberation,the objective metrics of enhanced speech have improved significantly,however,the subjective evaluation scores of listening tests have not gained the same degree of improvement,and the diversity of reverberation in real environments makes it difficult for traditional DNNs to target the enhancement of reverberant speech.This is mainly due to the fact that traditional DNNs do not consider the problem of different inter-frame interactions and correlation between consecutive frames under different degrees of reverberation when performing speech dereverberation,and use the same parameters for feature extraction.In addition previous studies on how to enhance reverberant speech in noisy environments have also been little studied.The main work of this paper to address these two issues includes.(1)In order to reduce the impact of reverberation on auditory perception using signal preprocessing methods,it is first necessary to understand the physical properties of reverberation and the principles of its generation.Therefore,this paper first investigates reverberant signals in closed space,summarizes the classical methods and the results of recent papers in the field of dereverberation,discusses some basic theories that need to be used in this study,discusses the supervised learning-based dereverberation framework and training The supervised learning-based dereverberation framework and training steps are discussed,as well as several widely used training objectives.(2)To address the problem that most current speech dereverberation methods do not take into account the mutual interference between different frames under different degrees of reverberation conditions,a feature extraction method based on reverberation time perception is proposed to optimize the dereverberation effect of the system under different reverberation environments by first classifying and training speech under different RT60 in the DNN training phase.In the dereverberation stage,the reverberation time is first estimated,and then suitable frame shift coefficients and speech context window coefficients are selected for feature extraction,and then the reverberated speech features are input to the trained DNN for dereverberation.A multi-target neural network is built on this basis,combining masking-based and amplitude spectrum estimation-based methods.The experimental results show that the proposed method in this paper has significantly improved speech intelligibility and speech quality compared with the traditional method that does not consider reverberation time,and shows strong robustness for reverberant speech in unknown environments.(3)For the speech enhancement problem when noise and reverberation exist at the same time,it is considered that reverberation and noise are two different types of interference and should be handled separately,thus a two-stage deep learning based speech enhancement strategy is proposed.Specifically,the denoising phase and the reverberation removal phase are performed sequentially using two deep learning models.In the denoising phase,a multiobjective neural network is introduced as the network model,and ideal ratio masking(IRM)is used as the training target combined with a method based on amplitude spectrum estimation,and in the de-reverberation phase,a BLSTM network is used combined with a method based on amplitude spectrum estimation to finally obtain the amplitude spectrum of the enhanced speech,and a new objective function is used to add the phase of pure speech to assist in the model training process.This will in turn improve the accuracy of phase estimation,followed by waveform reconstruction using an iterative phase reconstruction method,and finally the twostage model is jointly trained to optimize the proposed objective function.Experimental comparisons and speech spectrogram analysis show that the proposed algorithm substantially improves speech intelligibility and speech quality under the same conditions,and shows strong robustness to unknown noise and reverberation,compared to traditional single-stage speech enhancement methods.

Keywords/Search Tags:

multi-target deep neural networks, reverberation time perception, speech enhancement, amplitude spectrum estimation, ideal ratio masking

PDF Full Text Request

Related items

1	Research On Speech Enhancement Algorithm Based On Phase Spectrum Reconstruction Joint Amplitude Spectrum Estimation
2	Research On Single-channel Speech Enhancement Method Based On Deep Neural Networks And Time-frequency Masking
3	Speech Enhancement Based On Noise Spectrum Estimation Using Constrained Variance And Auditory Masking
4	Research On Speech Enhancement Algorithm Based On Prior Signal-to-noise Ratio Estimation
5	Research On Speech Intelligibility Enhancement Based On Amplitude Spectrum Constraint
6	Research And Improvement For Several Speech Enhancement Algorithms And De-noising
7	Study On Single Channel Speech Enhancement Algorithm Based On Deep Neural Network
8	Research On Deep Learning Based Speech Enhancement
9	Single-Channel Speech Enhancement Algorithm Based On Audio Feature Perception
10	Speech Enhancement Based On Deep Learning