Font Size: a A A

Speech Enhancement Algorithm Based On Deep Learning

Posted on:2019-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y N LiuFull Text:PDF
GTID:2428330593450052Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In real life,speech signals are often polluted by various noises in the surrounding environment,which seriously affects the performance of speech processing systems.In this case,we need to handle speech signals polluted by noise through adopting speech enhancement technology to suppress background noise and improve speech quality.The traditional speech enhancement algorithms have an obvious effect on suppressing stationary noise,but they are often not suitable for non-stationary noise.In addition,the traditional speech enhancement algorithms generally perform well in certain noise environment,which are difficult to deal with complex and variable noise environments.In view of the limitation of traditional speech enhancement algorithms,a complete set of speech enhancement algorithm using deep learning is proposed.The main contribution of this thesis is composed of the following aspects:(1)Based on the existing speech feature parameters,a modified speech feature parameter-Multi-Resolution Auditory Cepstral Coefficient(MRACC)is proposed.The MRACC is developed on the basis of Multi-Resolution Cochleagram(MRCG).The MRCG can not only focus on the high resolution feature of speech,but also grasp the low resolution feature.However,it is not particularly appropriate to simulate the human auditory non-linearity by using a logarithmic curve to compress the speech energy.The paper compresses the speech energy with the power function mapping.Besides,the dimension of MRCG is so large that the computational complexity is very high.Therefore a Discrete Cosine Transform(DCT)is used to reduce the dimension of the feature to step down the computational complexity.The experimental results show that the modified speech feature parameterMulti-Resolution Auditory Cepstral coefficient has better robustness and adaptability in the complex environment with low SNR.(2)In this paper,by taking advantage of strong nonlinear capacity of Deep Neural Network(DNN)in deep learning,we build a DNN speech enhancement model.Its structure is composed of 1 input layer,4 hidden layers and 1 output layer.The input layer is used to input the feature parameter of noisy speech.The hidden layer is stacked by multiple layers,and the output layer is used to output the estimated target.The DNN can not learn well the mapping relationship between input and output when the number of hidden layers is too small,but with the increase of the number of hidden layers,the network structure becomes complex and its mapping abilitydeclines.It is found in the experiment that speech enhancement based DNN has the performance when the number of hidden layers is set to 4.The number of nodes in each layer is 432-1024-1024-1024-1024-64 in turn.Each node in input layer represents MRACC feature,and each node in output layer represents the masking value of a frame with 64-channel Gammatone filter bank.(3)An adaptive masking threshold is proposed based on the existing time-frequency masking targets.Ideal Binary Mask(IBM)is the main target of computing auditory scene analysis,it has been proved not only to remove the noise more noise but also to greatly improve the intelligibility of the speech,but the speech quality is seriously damaged.Ideal Ratio Mask(IRM)can further improve speech intelligibility and speech quality,but the residual noise is more than IBM.Therefore,they can be combined to estimate the coefficients of IBM and IRM by tracking the change of noise,and then calculate an adaptive masking threshold.Experimental results show that compared with IBM,the adaptive masking threshold improves speech quality and intelligibility.Compared with IRM,the adaptive masking threshold not only eliminates more noise but also improves speech comfort.(4)Based on the above techniques,a speech enhancement algorithm based on deep learning is constructed.Compared with the contrast algorithm,the proposed algorithm not only has stronger robustness against noise,but also suppresses more background noise,improves the quality and intelligibility of speech.
Keywords/Search Tags:Speech enhancement, deep learning, multi-resolution auditory cepstral coefficient, adaptive masking threshold
PDF Full Text Request
Related items