Font Size: a A A

Speech Enhancement Based On Iterative Mask Estimation And Generative Adversarial Networks

Posted on:2021-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:J YuanFull Text:PDF
GTID:2518306470968259Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer and Internet technology,communication technology has become an indispensable part of people's life,such as mobile phone communication,online video chat and the others human-machine ineraction.Speech communication,as the most natural,effective and convenient way for people to communicate,has become an important research direction in the field of communication.However,in the actual communication process,when the signal is received,the speech signal will inevitably be interfered by various noises,such as noise in the car,mechanical noise in the factory,speaker in the supermarket,etc.,which will seriously affect the quality of communication,so the speech enhancement emerges as the times require.Speech enhancement is an important technology to improve the speech quality in the noise scenes,and it plays a very important role in suppressing noise,improving speech quality and intelligibility because it can reduce the listener's fatigue,especially when the listeners are exposed to high noise for a long time.In addition,speech enhancement is essential in many applications,such as hearing-impaired listeners wearing hearing aids,which can alleviate difficulties encountered in communicating in a noisy environment.Based on time-frequency mask estimation and Generative Adversarial Network(GAN),in this thesis,some effective speech enhancement methods are proposed.The specific work includes the following four aspects:Firstly,a monaural speech enhancement method based on ideal ratio mask and GAN is proposed.The GAN is used in terms of adversarial training to simultaneously estimate the magnitude spectrum of speech and noise in order to construct the ideal ratio mask.Then the mask which is placed on the network output layer and trained together with the neural network.In the enhancement process,the enhanced speech magnitude spectrum is obtained by the generator G that will be combined with the noisy speech phase to restore speech and complete the final speech enhancement task.The problems of robustness and generalization of traditional methods are improved.Secondly,a monaural speech enhancement method based on the CycleGAN under the unpaired training data is proposed.Traditional methods require the training data to be matched one-to-one,but for many tasks,the paired training data is difficult or expensive to obtain.So,this method introducs a cycle-consistent loss function to map the input samples of the network to the target samples.While retaining the speech components,it effectively suppresses the noise,minimizes the speech distortion,and improves the speech enhancement performance under the unpaired training data situations.Thirdly,a multi-channel speech enhancement method based on iterative mask and multiple-target GAN is proposed.This method uses multiple-target GAN to estimate the time-frequency mask in an iterative way,which improves the performance of the Minimum Variance Distortionless Response(MVDR)beamformer to enhance the desired sound source and suppress the undesired sound source.By adding post-filtering,the noise in the direction of the target sound source is further suppressed.Finally,this paper proposes a multi-channel speech coding and enhancement framework based on the Enhanced Voice Services(EVS)codec.This method embeds a multi-channel speech enhancement method based on iterative mask and multiple-target GAN into the front end of the EVS codec to achieve multi-channel speech coding and enhancement.
Keywords/Search Tags:speech enhancement, speech coding, deep neural network, time-frequency mask, beamforming
PDF Full Text Request
Related items