Font Size: a A A

Research On Speech Enhancement Model Based On Deep Neural Network

Posted on:2022-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y T WangFull Text:PDF
GTID:2518306557961809Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement is a particularly important field in speech preprocessing.In the past few decades,scholars have proposed many unsupervised speech enhancement methods.However,these traditional single-channel unsupervised speech enhancement methods usually make some unreasonable assumptions on the signal,resulting in "music noise" and speech distortion or even voice loss issues greatly limits the upper limit of performance.In recent years,Deep Neural Network(DNN)has received more and more attention and applications in the field of speech enhancement.Due to the deep nonlinear structure of DNN,the DNN network can be regarded as a noise reduction filter.At the same time,based on big data training,DNN can obtain an estimate of the target speech through training,so it can better represent the complex relationship between noise and clean speech.Therefore,the main research direction of this paper is a speech enhancement model based on DNN.This paper proposes a speech enhancement method that is different from traditional unsupervised speech enhancement methods.This method hardly needs to make any assumptions and can accurately learn noisy and The complex relationship of clean voice can achieve a large degree of performance improvement.In order to further improve the generalization ability of the model in complex environments,this article focuses on training data processing and model fusion to improve model performance,focusing on the study of the model's speech distortion in a low signal-to-noise ratio environment and improving the model's multiplicity in complex environments.The generalization ability of various types of noise.Details as follows:First,the basic theories of speech enhancement and deep neural networks are studied.Describes in detail the model framework and implementation process of typical methods such as speech signal preprocessing technology,traditional unsupervised speech enhancement methods,and supervised speech enhancement methods;in-depth understanding of existing deep neural network-based speech enhancement methods,their model frameworks,and training methods And optimization strategy,summarize the advantages and disadvantages of various methods,and lay a theoretical foundation for the next step of research.Secondly,a speech enhancement model based on deep neural network is proposed.The model uses logarithmic power spectrum as a feature,and divides the training of DNN into two steps,namely,unsupervised pre-training based on restricted Boltzmann machine and supervised tuning based on back error propagation algorithm.In the enhancement stage,extract the log power spectrum characteristics of the noisy speech and send it to the DNN model that has been trained in advance for decoding,so as to obtain an estimate of the LPS of the clean speech.Through comparative experiments,it is proved that our proposed model can achieve better results than traditional unsupervised methods in terms of Seg SNR,LSD,PESQ and STOI.It can not only effectively solve the problems of "music noise" and reduce voice loss,but also Significantly improve the sense of hearing and intelligibility of speech.Finally,an improved DNN speech enhancement model combined with Voice Activity Detection algorithm is proposed.On the basis of the existing model,VAD is introduced to process the original training data,and two voice enhancement models with respective advantages in processing noise and retaining voice are obtained,and then the VAD model is used to fuse the two with new VAD-DNN speech enhancement model is obtained with different processing advantages of speech enhancement models.Through experimental comparison,compared with the DNN speech enhancement model,the improved DNN speech enhancement model combined with VAD significantly improves LSD,PESQ and STOI indicators in a low signal-to-noise ratio environment.To sum up,the innovation of this article is mainly based on the deep neural network to propose a speech enhancement model.Unlike the traditional single-channel speech enhancement methods in the past,the model proposed in this article basically has no assumptions and can effectively solve the "music noise",Reduce voice loss and other problems,significantly improve the sense of hearing and intelligibility of voice,and obtain better performance than traditional methods.At the same time,in order to obtain better performance,the previously proposed model has been improved.The improved joint VAD DNN speech enhancement model has been improved in terms of training data and model fusion.This model is a good solution to the low signal-to-noise ratio environment.
Keywords/Search Tags:Speech enhancement, Deep Neural Network, Voice Activity Detection, Generalization Capacity
PDF Full Text Request
Related items