Font Size: a A A

Research On Deep Learning Speech Enhancement Algorithms That Effectively Improve Speech Intelligibility

Posted on:2021-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:H B FangFull Text:PDF
GTID:2518306110995229Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Deep neural networks based mapping or classification architectures for speech enhancement have achieved substantial improvement,but there is still a room for further improvement.Therefore,we first improve the cost function used for optimization of training stage in DNN based speech enhancement methods,and propose a deep learning speech enhancement algorithm based on the perceptual related cost function to effectively reduce the mismatch between the training cost function and human auditory perception.Next,by analyzing the architecture of the conventional speech enhancement algorithm and DNN-based speech enhancement methods,and complementing the two advantages,a DNNbased suppression gain estimation method for speech enhancement is proposed to achieve further improvement of intelligibility performance.First,the supervised learning methods based on different cost functions for speech enhancement are studied.The mean squared error(MSE)cost function between the network output and the training target is different from the human auditory perception based evaluation criterias,so the use of the MSE cost function for network model optimization does not guarantee speech intelligibility can be improved;frequency-weighted segmental SNR(fw SNRseg)is an objective evaluation standard of speech intelligibility which can reflect human auditory perception.Therefore,by taking the evaluation criterion in the network parameter training,this paper proposes a deep learning speech enhancement algorithm based on the perceptual related cost function.Systematic objective evaluations show that our proposed method compared with the DNN method based on the MSE cost function,the short-time objective intelligibility(STOI)score of a test speech is further improved while maintaining the speech quality no longer impaired in a wide range of noise types and signal-to-noise ratios.Next,we examine several DNN based speech enhancement algorithms.Although the mapping based end-to-end regression DNN model for speech enhancement can effectively remove noise component,the problem of speech distortion caused by this method is more serious.Estimation of suppression gain plays an important role in the conventional single-channel architecture for speech enhancement,and by combining the DNN methods with the single-channel conventional speech enhancement framework,we propose a single DNN-based suppression gain estimation speech enhancement algorithm.In addition,the input of the DNN has been expanded with a causal context to achieve real-time signal processing.Multiply the suppression gain of each frequency bin and the corresponding noisy magnitude spectrum to obtain an enhanced magnitude spectrum.On this basis,multiple DNNs are used to estimate clean speech amplitude spectrum,speech presence probability and suppression gain respectively,and explicit noise variance estimation is introduced.A structured DNN-based suppression gain estimation method is proposed.In addition,these DNN methods are trained based on the above-mentioned perceptual-related cost function.Finally,by comparing the evaluation results of these DNN methods,it is shown that incorporating DNN methods into a statistical noise suppression system and replacing certain estimators of the system yields better STOI results than employing a simple regression DNN to estimate the clean speech directly.
Keywords/Search Tags:Deep Neural Networks, Human Auditory Perception, Cost Function, Suppression Gain, Speech Enhancement
PDF Full Text Request
Related items