Font Size: a A A

Codebook-based Speech Enhancement Using Deep Neural Network

Posted on:2019-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2428330593950318Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Researchers have proposed the codebook-based methods for speech enhancement in these years.The Auto-Regressive(AR)model parameters of speech and noise are trained into codebooks separately as the priori information in the offline stage.In the online stage,it chooses the optimal code-words and estimates the AR gains of speech and noise.Finally,the Wiener filter is constructed and the enhanced speech is achieved.The codebook-based algorithms are suitable for suppressing the non-stationary noise,but they have two weaknesses.The first one is that it cannot choose the optimal codewords and estimate the AR gains accurately based on the maximum likelihood rules.The second one is that there are a lot of residual noise between the harmonics because the codebook-based methods only model the spectral envelop not the spectral details.In order to solve these two shortcomings,this paper mainly has three improvements.(1)The Deep neural network(DNN)is used to choose the optimal code-words of speech and noise,in order to improve the accuracy of selection.In the offline stage,we extract the Mel-Frequency Cesptral Coefficients(MFCC)as the training feature,and the training labels are sparse vectors which contain the indexes of optimal code-words.And the DNNs of speech and noise are trained separately based on the criterion of minimizing the cross entropy.In the online stage,the input of the DNN is the MFCC of noisy speech and the output is the selection probability of each code-word.We choose the optimal code-words of speech and noise based on the maximum value of the probabilities,and construct the Wiener filter.In order to remove the harmonic noise,we introduce the harmonic emphasis technique to solve that.(2)The complexity of the algorithm above is high because we need to train the codebook and the DNN successively.Moreover,the performance of the algorithm is declined because of the quantization error of the codebook.So,the DNN is utilized directly to estimate the AR model parameters of speech and noise.In the training stage,the training feature of the DNN is the Log Power Spectrum(LPS)of noisy speech,the training targets are the concatenated vectors of AR model parameters of speech and noise.By training this DNN,we can get the mapping function from the LPS of noisy speech to the AR model parameters of speech and noise.In the test stage,we use the estimated AR model parameters of speech and noise from trained DNN to construct the Wiener filter and get the enhanced speech.(3)The input of the conventional neural network only contains the feature of current frame,not the information of past states.So,the temporal continuity is not considered.In order to solve that,we proposed the Recurrent Stack Convolutional Auto Encoder(RS-CAE).The RS-CAE contains the input feature maps that include not only the log power spectrum(LPS)of the noisy speech,but also two additional feature maps that are the spectrum of past few frames outputted by the network,which can improve the performance of the network.Moreover,the codebook-based harmonic recovery technique is used to reconstruct the lost structure of harmonics.The objective test results confirm that our proposed methods achieve better performance compared with some existing approaches.
Keywords/Search Tags:Speech enhancement, Linear prediction, Wiener filer, Deep neural network, Recurrent Stack Convolutional Auto Encoder
PDF Full Text Request
Related items