Font Size: a A A

Research On Speech Enhancement Method Based On Time-frequency Analysis

Posted on:2022-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y J MaFull Text:PDF
GTID:2518306728497564Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Speech signal is distorted by background noise,which will affect the speech intelligibility and speech quality.Traditional speech enhancement methods usually only process signals in the time domain or frequency domain,while the time-frequency analysis method comprehensively considers the time domain and frequency domain characteristics of the signal.Multi-resolution time-frequency(TF)images are an expression of the results of time-frequency analysis of speech signals.It combines the characteristics of the spectrogram and time-domain waveforms,and visually shows the changes in speech characteristics over time and frequency.Based on time-frequency analysis,this paper uses multi-resolution TF images to enhance the speech signal.The main work is as follows:1.Using cross-modal processing technology for speech enhancement,a speech enhancement method based on secondary guided image filtering is proposed.The spectrogram is a type of multi-resolution TF image,in which the displayed speech is represented as a regular image foreground,and the noise is represented as a relatively uniform image background.This article first generates a spectrogram through the short-time Fourier transform(STFT)in the time-frequency analysis method,and uses this as a medium to use the guided image filtering in image processing technology to obtain the spectrogram of the impaired speech to the enhanced speech spectrogram mapping.In order to accurately remove the noise in the spectrogram,this paper proposes a secondary guided image filtering model based on the primary guided image filtering,and uses the particle swarm algorithm to optimize the parameters.The experimental results show that the improved guided image filtering method has better edge smoothness and filtering effect,the enhancement effect is the best under White noise,and compared with the spectrum subtraction,the PESQ value up to 0.58.2.Combining time-frequency analysis and deep learning methods,a deep learning speech enhancement method based on improved generative adversarial networks is proposed.First,after comparing the feature representation capabilities of the three common time-frequency analysis methods,STFT,Continuous Wavelet Transform(CWT),and Pseudo Wigner-Ville Distribution(PWVD),the CWT method was selected based on the algorithm complexity and the advantages of detailed analysis capabilities,to extract the multi-resolution TF image of the speech signal;Secondly,useL1 orL2 double distance to improve the loss function of Least Squares Generative Adversarial Networks(LSGANs);Finally,the TF image obtained by the CWT method is used as the input feature of the deep learning network under the optimization framework.The simulation results confirm that the method proposed in this paper adds a dual-distance parameter constraint generator on the basis of the least squares generation confrontation network,so that the output result is as consistent as possible with the real data,which can effectively improve the denoising effect;and according to the analysis of the PESQ results it can be seen that this method can not only remove white noise,but also has better removal ability for colored noise.
Keywords/Search Tags:Speech enhancement, Time-Frequency method, Spectrogram, Guided Image Filtering, Generative Adversarial Networks
PDF Full Text Request
Related items