Font Size: a A A

Research On Single-Channel Speech Enhancement Based On Generative Adversarial Network

Posted on:2021-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhouFull Text:PDF
GTID:2428330623483957Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of society,people's demands on communication systems and intelligent speech devices are getting higher and higher.As a technology that can improve the perceptual quality and intelligibility of speeches,speech enhancement is of great significance to the promotion of communication industry and artificial intelligence.In the actual acoustic scene,there are a lot of stati onary and non-stationary noises,and the clean speeches and noisy speeches are typical nonlinear relationship.Classical unsupervised speech enhancement methods rely on the first-order statistics of signals,they are suitable for stationary noise environme nt which conforms gauss priori.And most supervised speech enhancement methods take advantage of the non-linear structure of neural networks to learn the non-linear relationship between noisy and clean speeches,this kind of methods can obtain good results on noise known conditions,but the performance of the model drops sharply on noise unknown conditions and the enhanced speeches have poor quality and intelligibility.Generative adversarial network(GAN)is currently the most novel generative model based on deep learning,which has been successfully applied to speech enhancement.The study found that speech enhancement methods based on GAN are suitable for non-stationary noise and noise unknown conditions,and they can improve the perceptual quality and intelligibility of speeches effectively.This kind of methods have become the most potential research directions in this field.Therefore,this paper will study speech enhancement based on GAN and strive to improve the quality of generated speeches,the main works are as follows:(1)A speech enhancement method based on relativistic averag e GAN with mixed penalty is proposed.The standard speech enhancement GAN has some problems like the model converges slowly,the training process is unstable and gradient di sappears,which lead to poor quality and intelligibility of generated speeches.This paper analyzes the training mechanism of GAN,and study the measurement problem of difference between generated speeches and real speeches,proposes a speech enhancement method based on relativistic average GAN(RaGAN)with mixed penalty.RaGAN can solve the problem that the real data scores do not drop during the training process,and it also optimize the evaluation mechanism for real data and false data.The mixed penalty consists of L1 norm and mean square error,which can measure the distance between generated speeches and real speeches in data distri bution more accurately.The generated speeches will be closer to the real speeches by minimizing the value of mixed penalty,which can improve the quality of generated speeches.Experimental results on two different test sets show,compared with other comparison methods,the proposed method can better improve the quality and intelligibility of speeches on the noise unknown conditions.(2)A speech enhancement method based on GAN joint speech qu ality evaluation metrics optimization is proposed.Most improved GAN-based speech enhancement methods are proposed from the perspective of optimizing the structure of the GAN,this kind of methods have limited effect on improving the quality of generated speeches.In this paper,we proposed a speech enhancement method based on GAN joint scale invariant signal to distortion ratio(SI-SDR)optimization from the perspective of speech quality evaluation metrics,which regards improving speech quality as the goal of the model optimization.The loss function optimized by SI-SDR can guide the generator to generate speeches with higher quality score automatically,which can directly improve the clarity and intelligibility of the generated speeches.Experimental results on three different test sets show that the proposed method has better performance on both noise unknown conditions and low signal to noise ratios(SNR)conditions.
Keywords/Search Tags:Speech Enhancement, Deep Learning, Relativistic Average Generative Adversarial Network, Objective Intelligibility, Loss Function
PDF Full Text Request
Related items