Font Size: a A A

Research On Speech Enhancement Algorithm Based On Deep Neural Networks

Posted on:2022-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhengFull Text:PDF
GTID:2518306542480734Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of technology,smart devices have brought tremendous changes to daily work and life.Voice issuing commands and commanding machines has become a faster and more convenient way of communication.Voice,as a commonly used method of information exchange between people and between people and smart devices,is an irreplaceable and extremely convenient medium.However,in daily life,voice recognition,and military communications,there are always various background noises,which seriously interfere with normal life and communications,and affect the voice content received by the listeners.The main purpose of speech enhancement is to reduce and suppress the interference of background noise on the target speech,so that the audience can enjoy the target speech with high quality and high intelligibility.Traditional unsupervised speech enhancement algorithms have been developed for decades and the technology is relatively mature,but the algorithm model makes some assumptions about the correlation between speech and noise,which leads to limitations in the application of unsupervised speech algorithms.Comparison of noise signals over time The effect of smooth noise processing is better,but the effect of noise processing with relatively fast change is not good.In recent years,deep learning technology has been applied in various fields.Among them,it has made outstanding achievements in the field of speech.Using the powerful data analysis capabilities of Deep Neural Network(DNN)to process noisy speech has become a hot research problem..Research has found that DNN-based speech enhancement algorithms are superior to traditional algorithms in denoising.In this paper,DNN is used to process speech,focusing on the high noise ratio of noisy speech under low signal-to-noise ratio conditions,and optimizing the time-frequency mask function to better fit the speech structure,and simultaneously estimating the phase of pure speech.The main work And the contribution is as follows:(1)Based on the DNN-based speech enhancement algorithm,based on the complementarity of information contained between the short-time Fourier amplitude spectrum(STFT-Amplitude Spectra,SAS)and the log power spectrum(Log power Spectra,LPS)features,The two features are spliced in the input layer,and the neural network model is designed to extract the features,so that the network can learn more detailed information at the same time,and the output features are post-fused,and the network that combines the two features performs better The denoising effect.(2)Aiming at the problem that the time-frequency mask function has nothing to do with the signal-to-noise ratio in the speech enhancement algorithm based on the time-frequency mask,it is proposed to use the signal-to-noise ratio information to optimize the time-frequency mask function,and to enhance the denoising through a two-stage network performance.In the first stage,the neural network is used to perform preliminary separation processing on the noisy speech to estimate the prior signal-to-noise ratio.In the second stage,the gain function is set according to the prior signal-to-noise ratio to optimize the time-frequency mask function.The phase difference coefficient is added to the function to improve the accuracy of the estimation.After optimization,the function performs well under various complex environmental conditions,better retaining the voice harmonic structure,and reducing residual noise at the same time.(3)DNN-based speech enhancement,the phase information is ignored in the training phase,and the phase of the noisy speech signal is used to replace the phase of the pure speech signal in the speech signal recovery phase.The phase contains more detailed information,and the phase information is conducive to recovery.Voice harmonic structure.Based on this,it is proposed to introduce a multi-task learning model on the basis of the DNN model.Through the deep neural network,the synchronous estimation of the speech amplitude and phase is realized,and the estimated pure speech phase is used in the waveform recovery.On the one hand,it reduces The complexity of the model,on the other hand,the use of pure speech phase to reconstruct speech,compared with the method that only considers the amplitude estimation,improves the speech quality in various noisy environments and enhances the listening experience.
Keywords/Search Tags:speech enhancement, DNN, feature stitching, SNR information, phase information
PDF Full Text Request
Related items