Font Size: a A A

Research On Auto-encoders And Generative Adversarial Network Based Speech Enhancement

Posted on:2020-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:R L XuFull Text:PDF
GTID:2428330575994178Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement is one of the important branches in speech signal processing.Speech is the most convenient means for human to exchange information,but in all kinds of voice communication and human-computer interaction scenarios,speech signals were often disturbed by various noises.Speech enhancement,as a method and means to avoid or reduce noise interference,has been widely studied.In the past decades,a large number of unsupervised speech enhancement methods and supervised speech enhancement algorithms have been proposed.In the earlier methods,the noise spectrum was first estimated,and then the estimated noise spectrum was subtracted from the noisy speech spectrum to obtain the enhanced speech spectrum.Early methods often assume that speech and noise are independent and obey Gauss distribution,but noise is random,non-linear and non-stationary,resulting in poor results of these methods,such as residual noise or distortion.In recent years,with the development of deep learning and the successful application of deep learning in similar areas of speech,the research of speech enhancement based on deep learning is also becoming a hot topic.In the speech enhancement system based on deep learning,the deep learning model is designed as a fine denoising filter or a generator.At the same time,under the training of a large number of parallel corpus,the model can fully learn the complex non-linear relationship between noisy speech and clean speech.In addition,model training was usually off-line,it can extract some noise features,so it can better suppress or even filter some non-stationary noise.In view of the better performance of deep learning model in speech enhancement,a lot of research has been carried out in this paper.Firstly,the Deep Auto-Encoder(DAE)is used for speech enhancement tasks,and a series of studies are carried out on the model.Then,combining the Deep Auto-Encoder with Generative Adversarial Network(GAN),an AE-CGAN network is proposed and applied to speech enhancement in this paper.In DAE-based speech enhancement method,the time-domain speech signal is first windowed and framed,then the short-time Fourier transform is performed.Then the multiframe speech spectrum is input into the model,and the single frame speech spectrum is output.The output speech spectrum is reconstructed by waveform reconstruction,and the enhanced time-domain speech signal is obtained.The model is trained by supervised method and generalized by L2 regularization,Dropout and batch standardization(BN)to make the model more robust.The experimental results show that the performance of DAE-based speech enhancement method is better than that of traditional methods,which can improve the speech quality and speech intelligibility at the same time.The proposed AE-CGAN enhancement model is an end-to-end time domain speech enhancement model.The input of the model is time domain speech signal,and the output is also time domain speech signal.The model does not need to assume the relationship between speech and noise,and does not need to extract speech features manually,but automatically extracts speech features through end-to-end way.AE-CGAN uses convolution neural network to extract clean speech signals from noisy speech signals through the powerful feature extraction ability of convolution network.The model uses semi-supervised learning method,and combines explicit loss function and implicit loss function to conduct antagonistic training.In order to make the network lighter,faster,deeper and wider,the model uses full convolution network,batch standardization(BN)and Parametric ReLu activation function.In order to prevent gradient explosion,weight clipping is added to the training to keep the network parameters within a reasonable range.The experimental results show that the speech enhancement method based on AE-CGAN is better than the traditional method and DAE method,and has stronger denoising ability.After processing,the speech quality and intelligibility are greatly improved,which sounds fuller,not low-pitched and more natural.
Keywords/Search Tags:speech enhancement, deep learning, generative adversarial networks, autoencoder, fully convolutional network
PDF Full Text Request
Related items