Font Size: a A A

Speech Enhancement Research Based On Sparse Representation And Deep Neural Network

Posted on:2021-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:W M WangFull Text:PDF
GTID:2518306113451414Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the actual living environment,it is inevitably interfered by all kinds of noise,and speech enhancement technology aims to remove noise from noisy speech and improve speech quality and clarity.It can be widely used in modern communication systems,hearing aids,speech recognition and other fields.In recent years,in view of the fact that the Deep Neural Networks(DNN)has the ability to simulate the nonlinear relationship between noisy speech features and learning targets,it has been widely used in speech enhancement with poor signal-to-noise ratio or non-stationary noise background.Therefore,this thesis uses the DNN model to estimate the time-frequency mask to reduce the computational complexity and further improve the speech intelligibility.The main work of this thesis is as follows:(1)This thesis introduces the research significance of speech enhancement,analyzes the current research progress of speech enhancement from unsupervised and supervised,and focuses on the research results of supervised speech enhancement methods.The model,features and learning targets of the speech enhancement system based on DNN are introduced in detail.(2)In order to solve the problem of high computational complexity and easy over-fitting of DNN,and to promote the ability of time-frequency mask to remove noise and retain speech components,an adaptive soft mask optimized normal sparse DNN speech enhancement method is proposed.First of all,it is proposed that the normal probability density function with variance parameters is added to the objective function as a sparse penalty term,which is based on the deviation between the activation probability of the hidden layer unit and the constant factor,and the variance parameters are set to jointly control the data sparsity,so as to extract concise and effective features and improve the operation speed at the same time.Secondly,a tanh-type adaptive adjustment factor related to SNR is proposed to control the proportion of ideal binary mask and ideal ratio mask in the learning target,so as to reduce speech distortion while filtering noise.Finally,several groups of experiments are designed to prove that the combination of normal sparse DNN and adaptive soft mask can effectively reduce the running time on the premise of improving speech intelligibility and reducing distortion.(3)In order to make the enhanced speech accord with the perceptual characteristics of the human ear and further improve the speech intelligibility,this thesis proposes a speech enhancement method based on Gammatone cochlear power-law cepstrum coefficient and soft mask which combines phase difference information to optimize normal sparse DNN.First of all,the initial feature is compressed by the power function which is more consistent with the human auditory perception,and the discrete cosine transform is used to decorrelate the initial feature,and the first-order and second-order difference features are fused,which can effectively capture the instantaneous change information of speech and maintain the continuity of hearing sensation.Then the improved auto regressive moving average filter is used to smooth the features and remove the burr,that is,the Gammatone cochlear power-law cepstrum coefficient is obtained.Secondly,considering that the phase contains information related to speech intelligibility,and the phase difference can provide effective spectrum structure information,the phase difference between noisy speech and clean speech,and the phase difference between noisy speech and noise are incorporated into the adaptive soft mask to improve the intelligibility of enhanced speech.Finally,the experimental results under different signal-to-noise ratio and various noise background show that the overall performance of Gammatone cochlear power-law cepstrum coefficient is better than other contrast features.At the same time,it is proved that the soft mask fused with phase difference information can effectively improve the clarity of speech signal and reduce the sense of auditory fatigue.
Keywords/Search Tags:Speech enhancement, Deep neural network, Adaptive soft mask, Normal sparse, Gammatone cochlear power-rate cepstrum coefficient, Phase difference
PDF Full Text Request
Related items