Font Size: a A A

Research On Speech Enhancement Based On Deep Neural Network

Posted on:2020-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:N LiFull Text:PDF
GTID:2428330620956144Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement is widely used in the speech signal processing system and artificial intelligence system.In practical environment,traditional speech enhancement algorithms have some problems,such as low enhancement performance and poor generalization performance.In this thesis,based on the perceptual characteristics of the human hearing system and deep learning network structure,a signle channel speech enhancement algorithm is studied based on deep learning network.The algorithms proposed in this thesis mainly include two aspects: speech enhancement algorithm based on multi-resolution cochleagram feature(MRCG)and deep learning network(DNN),speech enhancement algorithm based on spectrogram and condition Generative Adversarial Nets(cGAN).(1)Speech enhancement algorithm based on multi-resolution cochleagram feature and deep learning network.Different from the traditional STFT(Short-Time Fourier Transform),this thesis is based on Gammatone filter,and the MRCG of each time-frequency is extracted as the spectral feature.Two frames befor and after are combined and as input parameters of DNN.Training target is IRM(Ideal Ratio Mask).DNN updates gradient by Root Mean Square Prop(RMSProp)algorithm,which solves the unstable of traditional networks.In this thesis,Perceptual Evaluation of Speech Quality(PESQ)and Short-time Objective Intelligibility(STOI)are used as evaluation indicators.Simulation results show that the performance of this algorithm is superior to that of the traditional algorithm.(2)Speech enhancement algorithm based on spectrogram and condition Generative Adversarial Nets.CGAN is mostly used for image enhancement and recognition at present.In this thesis,a mapping algorithm from noisy spectrogram to enhanced soectrogram based on cGAN is proposed.CGAN uses the original noisy as a condition to input generative network,and trains with U-Net structure,encoder-decoder structure,and adds jump connections between upper and lower sampling layers.In this thesis,STOI and PESQ are used as evaluation index.The simulation results show that in speech enhancement,cGAN can improve the quality of speech separation,and STOI is better than the algorithm based on MRCG.Also,for babble noise,cGAN is more effective than MRCG.In addition,cGAN has good generalization performance in different kinds of noise.
Keywords/Search Tags:Deep Neural Network, Speech Enhancement, Condition Generative Adversarial Nets
PDF Full Text Request
Related items