Research On Speech Enhancement Algorithm Based On Deep Neural Network

Posted on:2020-08-11

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhang

Full Text:PDF

GTID:2428330611999447

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

Speech enhancement,also known as noise reduction,aims to reduce and suppress the interference of background noise on the target speech,and improve the speech quality and clarity.Speech enhancement algorithm is widely used in daily life and work,so it is an important part of speech signal processing.Traditional single-channel speech enhancement methods need to make assumptions about noise and signal in advance,so that they have limitations on noise reduction performance.In recent years,with the popularization of information technology,the dataset scale increasing and the speed of computer processing improving,the advantages of deep neural network are embodied.Compared with traditional speech enhancement methods,the advantage of deep neural network is that it does not need to make assumptions about the noise in advance.The main research content of this paper was to implement speech enhancement with the deep neural network method.The strong mapping ability of th e deep neural network were supposed to learn the complex relationship between noisy speech and clean speech.The whole speech enhancement system was divided into three stages: preparation,training and enhancement.The preparation stage,as the pre-processing of network training,was to generate the speech dataset and extract their signal features.During the training stage,the parameters of the deep neural network were iteratively updated through the back propagation.In the enhancement stage,the noisy speech was processed through the trained network to obtain the enhanced speech signal.Considering the nonlinear perception of human ear to speech,in this paper,the mapping model of the logarithmic power spectrum of speech was adopted as the baseline system.In view of the partial speech signal distortion,in this paper,the mothod was proposed which combined the amplitude spectrum feature with the logarithmic power spectrum feature.Two features were concatenated at the input end of deep neural network,making the network learn and optimize these two different goals at the same time.The network can simultaneously learn about the differences and commonalities between different features.In the enhancement stage,the two features would be fused by post-processing method to obtain the enhanced speech signal.Experiments showed that the method of feature concatenation significantly improved the quality and clarity of noisy speech at low SNR and alleviated speech distortion.In order to further improve the performance of noise reduction,in the paper,the idea of skip connection was used.The original input data of the model was hopped onto the output end of each hidden layer,which was concatenated with the output of the current hidden layer.The concatenated feature was used as the input of the next hidden layer.This method enables the original input data to be used repeatedly,and the features learned at each layer were more complex and diverse.Finally,a combination of the above two methods was proposed.In other words,while jointing and optimizing the speech features of logarithmic power spectrum and amplitude spectrum,the input logarithmic spectrum features were hopped and stacked.The proposed method improves the PESQ(Perceptual Evaluation of Speech Quality)result of mismatched noisy speech by 0.47,indicating that the quality of the speech has been significantly improved.It also showed that the improved enhancement system has good noise reduction capacity and generalization capacity.

Keywords/Search Tags:

speech enhancement, deep neural network, features jointing, features stacking

Related items

1	Research On Deep Learning Based Speech Enhancement
2	Computational Auditory Model And Deep Neural Network Based Binaural Speech Segregation
3	Research Of Monaural Speech Enhancement Based On Quality Assessment And Deep Neural Network
4	Research Of Deep Learning Based Low-resource Speech Recognition
5	The Research Of Dimensional Speech Emotion Recognition Based On Neural Network And Fusion Features
6	No-Reference Image Quality Assessment Based On Deep Features Of Enhancement
7	Time-domain Speech Enhancement Algorithm Based On Multi-scale Features
8	Analysis Of Effective Fused Features And Model Evaluation For Speech Emotion Recognition
9	Deep Learning Speech Enhancement Technology Considering Time-frequency Features
10	Single-Channel Speech Enhancement Algorithm Based On Audio Feature Perception