Font Size: a A A

Research On Speech Enhancement Method Based On Deep Learning Neural Networks

Posted on:2018-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2348330518998904Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Speech is the most basic means used by people to communicate with each other,but in reality,the speech is always disturbed by various kinds of noise.Noise would not only reduce the speech quality but also affect the intelligibility of speech.Besides,it can also lead to the performance of speech processing system becomes worse quickly.The purpose of speech enhancement technology is to suppress the noise,withdrawing the pure primitive speech from noisy speech as far as possible.At present,speech enhancement technology is widely used in speech coding,speech recognition,military,medical and many other fields.The traditional monaural speech enhancement methods include spectral subtraction,wiener filtering,enhancement methods based on statistical models and so on.In the past decades,several new researches have been made on speech enhancement,such as the methods based on wavelet transform and auditory masking.Several traditional monaural speech enhancement algorithms have been studied in the thesis.In the process of algorithm analysis,it was found that in the low signal-to-noise ratio,the traditional algorithms usually performed terribly.In order to improve the effectiveness of the enhancement,the back propagation(BP)neural networks and two mainstream models of deep learning named stack automatic encoding(SAE)and deep belief networks(DBN)have been studied deeply as well.The neural networks are able to simulate the principle of the human brain which has the ability of non-linear mapping.Based on these,the speech enhancement method with the help of DBN was proposed,in this method,the BP algorithm was used for fine tuning.And at last the subspace speech enhancement algorithm was improved in the thesis.The speech enhancement method based on DBN was carried out by pre-training and fine-tuning so that it can well learn the relationship between the noisy speech and the noise.After obtaining the estimation of the noise amplitude spectrum in the noisy speech,the estimation of the pure speech amplitude spectrum can be obtained by using the noisy speech amplitude spectrum minus the noise amplitude spectrum.Finally,because of the characteristic that the human ear is insensitive to the phase information,the time-domain waveforms of the enhanced speech were reconstructed by overlapping addition method with the phase spectrum of the noisy speech.In order to further improve the quality of the speech,a number of deep belief networks that adapted to different types of noise were trained,and the noise classification module was added before noise amplitude spectrum estimation module.It was found that there were two shortcomings in traditional subspace speech enhancement algorithm: Voice activity detection(VAD)was used in the method to estimate the noise,but it was unable to update the noise in time;the performance of VAD has reduced rapidly in low signal-to-noise ratio,which leads to the performance of subspace speech enhancement algorithm decline rapidly too,so two improvement programs were proposed respectively to solve the problems: One solution was to updated the noise with the estimated noise of the networks in speech frame detected by VAD,the second solution was updated noise in each frame with the estimated noise of the networks.Finally,the subspace method,the proposed method and two improved subspace methods were simulated on the MATLAB platform.By comparing the performance of the methods,it was found that the proposed speech enhancement method was better than subspace method in low SNR,the performances of the two improved subspace methods were superior to the traditional subspace method in both high SNR and low SNR,and the lower the SNR was,the better the improved subspace methods performed.
Keywords/Search Tags:Speech Enhancement, BP Neural Networks, Deep Learning, Stacked Auto Encoder, Deep Belief Networks
PDF Full Text Request
Related items