Font Size: a A A

Speech Enhancement Based On Deep Neural Network And Recurrent Neural Network

Posted on:2021-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2428330602997322Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Speech enhancement is the task of recovering clean speech signal from noisy speech signal.The goal is to improve the quality and intelligibility of speech signals corrupted by noise.It has many applications including mobile speech communication,hearing aids design,robust automatic speech recognition,and automatic speaker recognition.A lot of speech enhancement methods have been proposed over the past several decades.Spectral subtraction subtracts an estimate of the short-term noise spectrum to achieve speech enhancement.Wiener filtering is a method using an all-pole model.A common problem of the above two methods is the introduction of“music noise”into the results.Until the minimum mean-square error(MMSE)estimator was proposed by Ephraim and Malah.the problem of music noise was improved.After that,many MMSE-based methods were proposed,such as MMSE log-spectral amplitude estimator and optimally-modified log-spectral amplitude(OM-LSA)speech estimator.In most of these methods,it is assumed that an estimate of the noise spectrum is available.However,noise model would hardly be estimated correctly at the low SNRs.which results in severe distortion in speech-enhanced signals.In order to overcome the shortcomings of traditional speech enhancement methods,speech enhancement methods based on deep learning have developed rapidly in recent years.These new deep learning methods mainly include deep neural networks(DNNs),convolutional neural networks(CNNs),recurrent neural networks(RNNs).where the most noteworthy is long short-term memory(LSTM).Recently,the generative adversarial networks(GANs)is also used for speech enhancement.In addition,there are also many combinations of DNNs and traditional methods,such as the joint of DNNs and Wiener filter,and the joint of DNNs and non-negative matrix factorization.The above methods can achieve better performance than traditional methods through a large number of data training.It shows that Deep Neural Networks(DNNs)have been successfully adopted as a regression model in speech enhancement.Nonetheless,the performance in the battlefield environment is not always satisfactory because the noise energy is often dominating in certain speech segments causing speech distortion.For the speech enhancement in complicated battlefield environment where multiple noises can simultaneously corrupt speech,such as gunshots and explosions,we propose an enhanced method to improve the existing DNN-based speech enhancement by using Recurrent Neural Networks(RNNs).This RNN model judges whether each frame is in a low SNR state,and then fuses two DNN-based speech enhancement models.The proposed method is compared with existing DNN-based speech enhancement techniques through the perceptual evaluation of speech quality(PESQ)and the short-time objective intelligibility(STOI)scores in various noisy speech conditions.The experimental results demonstrate significant improvements over the state-of-the-art techniques and reflect the usability of the method in a real battlefield environment.
Keywords/Search Tags:Deep Neural Networks(DNNs), Recurrent Neural Networks(RNNs), speech enhancement, battlefield environment, perceptual evaluation of speech quality(PESQ), short-time objective intelligibility(STOI)
PDF Full Text Request
Related items