Font Size: a A A

Research On Speech Enhancement Method Based On DNN And MultiResU?Net

Posted on:2022-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:C D LiuFull Text:PDF
GTID:2518306317490014Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
As a front-end algorithm of the speech signal,speech enhancement is mainly used to remove the background noise in the speech signal,improve the clarity of speech and the accuracy of speech recognition.Speech enhancement technology plays an important role in many fields,such as hearing-aids,intelligent medical treatment,intelligent home.The nonlinear fitting ability of deep learning is relatively powerful,so it gradually replaces traditional algorithms and is applied to speech enhancement.However,when the SNR is low,the speech enhancement algorithm based on deep learning is still facing great challenges,and there is still room for improvement in the speech enhancement effect.In this paper,based on the speech enhancement technology of masking and mapping method,DNN and MultiResU?Net network are improved and optimized,and then the noisy speech signal is enhanced.The main research work is as follows:Multi-resolution cochleagram(MRCG)proposed for low SNR.It contains not only the local feature of speech but also the global feature of speech.It has been demonstrated that MRCG is the speech feature of the optimal.To further find out the clearer speech features in low SNR environment,short-time spectral amplitude minimum mean square error estimation was used to denoise the global features in MRCG,and then improved multi-resolution cochleagram features were proposed,used to analyze speech features in the time-frequency domain.In this paper,improved MRCG was used as the input feature and Skip Connections-Deep-neural Network(Skip-DNN)was used as the training network.In order to improve the effectiveness of network training,the source-to-distortion ratio(SDR)was improved in a logarithmic way,and the improved SDR was taken as the loss function and the ideal ratio masking was taken as the training objective to establish a speech enhancement model.In different SNR environments,the speech data from Librispeech ASR corpus were used to compare and analyze the speech enhancement effect of mainstream feature combinations,multi-resolution auditory cepstrum coefficients,and improved MRCG as Skip-DNN input.The research shows that the speech enhancement model based on improved MRCG and Skip-DNN has the best effect.The effects of mean square error,SDR,and improved SDR on network training are compared and analyzed.The results show that the improved SDR as a loss function based on improved MRCG and Skip-DNN speech enhancement model has a higher objective speech evaluation score.In order to further improve the speech enhancement effect in the low SNR environment,the voice signal through after get the time-frequency diagram of short time Fourier transform(STFT)as network input and the training goal,improve the resolution of the residual U network(Multi resolution Residual U Network,MultiResU?Net).Subpixel convolution layer is used to improve the up-sampling process and restore the details of the network.Residual paths are rearranged with the sampled output features of the decoder in a hybrid channel to improve the ability of information fusion,establish improved MultiResU?Net speech enhancement model.In different SNR environments,speech enhancement effects under different network depths and different window sizes are studied by using speech data in Librispeechasr corpus.The enhancement effect based on traditional MultiResU?Net,full convolutional neural network,U-type network,and improved MultiResU?Net speech enhancement model is discussed.The results show that when the depth is 9 and the window size is 3×5,the improved MultiResU?Net network model is the best.Under different SNR conditions,the improved MultiResU?Net speech enhancement model scores higher than other models in the evaluation index,which indicates that the improved MultiResU?Net speech enhancement model proposed in this paper has better speech enhancement effect,and this method is especially suitable for speech enhancement at low SNR.
Keywords/Search Tags:speech enhancement, deep neural networks, low signal noise ratio, multi-resolution cochleagram, multiresolution residual U network
PDF Full Text Request
Related items