Speech Enhancement Based On Deep Neural Network And Recurrent Neural Network

Posted on:2021-02-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Zhang

Full Text:PDF

GTID:2428330602997322

Subject:Computer application technology

Abstract/Summary:

Speech enhancement is the task of recovering clean speech signal from noisy speech signal.The goal is to improve the quality and intelligibility of speech signals corrupted by noise.It has many applications including mobile speech communication,hearing aids design,robust automatic speech recognition,and automatic speaker recognition.A lot of speech enhancement methods have been proposed over the past several decades.Spectral subtraction subtracts an estimate of the short-term noise spectrum to achieve speech enhancement.Wiener filtering is a method using an all-pole model.A common problem of the above two methods is the introduction of“music noise”into the results.Until the minimum mean-square error(MMSE)estimator was proposed by Ephraim and Malah.the problem of music noise was improved.After that,many MMSE-based methods were proposed,such as MMSE log-spectral amplitude estimator and optimally-modified log-spectral amplitude(OM-LSA)speech estimator.In most of these methods,it is assumed that an estimate of the noise spectrum is available.However,noise model would hardly be estimated correctly at the low SNRs.which results in severe distortion in speech-enhanced signals.In order to overcome the shortcomings of traditional speech enhancement methods,speech enhancement methods based on deep learning have developed rapidly in recent years.These new deep learning methods mainly include deep neural networks(DNNs),convolutional neural networks(CNNs),recurrent neural networks(RNNs).where the most noteworthy is long short-term memory(LSTM).Recently,the generative adversarial networks(GANs)is also used for speech enhancement.In addition,there are also many combinations of DNNs and traditional methods,such as the joint of DNNs and Wiener filter,and the joint of DNNs and non-negative matrix factorization.The above methods can achieve better performance than traditional methods through a large number of data training.It shows that Deep Neural Networks(DNNs)have been successfully adopted as a regression model in speech enhancement.Nonetheless,the performance in the battlefield environment is not always satisfactory because the noise energy is often dominating in certain speech segments causing speech distortion.For the speech enhancement in complicated battlefield environment where multiple noises can simultaneously corrupt speech,such as gunshots and explosions,we propose an enhanced method to improve the existing DNN-based speech enhancement by using Recurrent Neural Networks(RNNs).This RNN model judges whether each frame is in a low SNR state,and then fuses two DNN-based speech enhancement models.The proposed method is compared with existing DNN-based speech enhancement techniques through the perceptual evaluation of speech quality(PESQ)and the short-time objective intelligibility(STOI)scores in various noisy speech conditions.The experimental results demonstrate significant improvements over the state-of-the-art techniques and reflect the usability of the method in a real battlefield environment.

Keywords/Search Tags:

Deep Neural Networks(DNNs), Recurrent Neural Networks(RNNs), speech enhancement, battlefield environment, perceptual evaluation of speech quality(PESQ), short-time objective intelligibility(STOI)

Related items

1	Research On Supervised Speech Enhancement Based On Deep Neural Networks
2	Study On Objective Quality Comprehensive Evaluation Of Face Mask Speech
3	Research On Near-end Listening Enhancement Algorithm Based On Lombard Speech Conversion
4	Research On Deep Learning Speech Enhancement Algorithms That Effectively Improve Speech Intelligibility
5	Study On The Applications Of Neural Networks In Objective Assessment Of Speech Quality
6	Research On Supervised Speech Separation Based On Deep Learning
7	Research On Deep Neural Network Based Speech Dereverberation
8	Study On Objective Speech Quality Assessment For Speech Communication
9	Research On Speech Bandwidth Extension Methods Using Neural Networks
10	The Study Of Features Estimation For Speech Intelligibility Enhancement