Font Size: a A A

Research On Speech Enhancement Algorithm Based On Image Edge Preserving Filtering Technology

Posted on:2021-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:L H YanFull Text:PDF
GTID:2438330611454125Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement(SE)technology can remove noise from noisy speech and extract clean speech as much as possible.It has the functions of reducing speech distortion,improving speech quality,and reducing auditory fatigue.SE technology has been widely used in mobile communications,computers,smart wearable devices,intelligent housing system and other products and applications.Traditional single-channel speech enhancement algorithms,in general,can well suppress stationary noise and achieve the purpose of improving speech quality,but no more ideal intelligibility is obtained when the speech processed by the traditional algorithms.Almost the traditional speech enhancement algorithms require to estimate the power spectral density of noise.The accuracy of noise estimation is directly related to the noise reduction performance of the algorithm and the amount of speech distortion.Generally,single-channel speech enhancement algorithms such as spectral subtraction,Wiener filtering,subspace method,and a priori signal-to-noise ratio(SNR)estimation method can achieve satisfactory results in the estimation and update of stationary noise,while in more realistic scenarios,for instance restaurants,station waiting rooms,etc.,the noise spectrum characteristics change dramatically.As a consequence,the noise estimation effect of these algorithms becomes less than ideal,and the noise reduction performance is greatly reduced,so the application of these algorithms also become limited.In view of the limitations of current single-channel speech enhancement,this paper intends to study speech enhancement algorithms based on image edge-preservation filtering techniques.In this study,we utilize the bilateral image filtering and guided image filtering in image processing to study the similarities and differences between the time-frequency bins and image pixels through theoretical modeling.The edge-preserving denoising is used to deal with the time-frequency bins of the speech spectrogram,with the purpose that the edge information of the spectrogram is kept when the background noise is smoothed.In addition,aiming at the current research situation and problems of the supervised algorithms,this study attempts to use the method of spectrogram denoising based on convolutional neural network(CNN)for speech enhancement.The research idea is to improve the speech quality while avoiding the problem of noise spectrum estimation which is unavoidable by the traditional speech enhancement algorithms,and on the other hand,to strengthen the current situation of non-stationary noise suppression.The specific work and innovation of this article are reflected in the following three points:(1)This paper presents an improved method about OMLSA(Optimally Modified Log-spectral Amplitude)algorithm based on bilateral spectrogram filtering.The spectrogram of speech signal is processed by bilateral filtering technique to realize speech denoising.Herein,the spectrogram of clean speech is regarded as a clean image,each time-frequency bin represents a pixel,and the normalized noisy spectrogram is regarded as the corresponding image disturbed or atomized by certain noise.Thus,the enhanced spectrogram information is used to estimate the a posteriori SNR of the OMLSA algorithm,which can effectively suppress the noise and fuzzy areas of the noisy signal,and finally a relatively pure speech spectrum can be obtained,and the speech signal can be reconstructed in the time domain.(2)A guided spectrogram filtering algorithm based on auditory masking effect is proposed for speech enhancement.By analyzing the operation and application of the guided image filter,because of the benefit of the local linear model of the guided filter,the edge-preserving effect and algorithm efficiency are better than bilateral filtering,and it can successfully overcome the gradient inversion problem arising from bilateral filtering.A theoretical derivation of the guided spectrogram filter expression is made,and the guided filter is used to suppress the background noise of the spectrogram and sharpen the spectrum to extract clean speech,combined with the auditory masking effect of human ear to adjust and reduce the residual noise adaptively according to auditory masking threshold of the enhanced speech spectrum.The performances of a variety of traditional single-channel SE algorithms are comprehensively compared in different noise environments,and the performance of guided spectrogram filtering in both stationary and non-stationary noisy environments is mainly studied,so as to improve the quality,intelligibility and naturalness of speech.(3)Aiming at residual noise problem caused by bilateral filtering and guided filtering algorithms between the middle-and low-frequency in spectrogram,we transform the study of unsupervised algorithm to the supervised algorithm,and a novel speech enhancement method based on spectrogram denoising CNN is proposed.The speech signal is extracted according to the feature of the image,and the spectrogram is used as the training set.The image denoising CNN with outstanding performance is used to perform the denoising of the spectrogram,avoiding the limited development depth of the recurrent neural network and problem of excessive complexity commonly presented in traditional speech features.Therefore,it is easier to obtain a large amount of training data by relying on the spectrogram clipping strategy,and the storage cost is much smaller.A deeper network is used to improve the capacity and flexibility to use the features of the spectrogram,and it can also capture enough spatial information to make the noise reduction performance better.The proposed model uses residual learning strategies in CNN training,with the combination of batch normalization,which greatly improves the performance of the model.The proposed spectrogram denoising model has better learning ability and noise reduction performance for both seen and unseen noise signals,so that the system of this study shows superior and robust speech enhancement effect.
Keywords/Search Tags:Speech enhancement, Bilateral spectrogram filtering, Guided spectrogram filtering, OMLSA, Auditory masking effect, Spectrogram denoising, Convolutional neural network
PDF Full Text Request
Related items