Font Size: a A A

Research On Speech Enhancement Based On Wasserstein Generative Adversarial Networks

Posted on:2020-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:S S YeFull Text:PDF
GTID:2428330575956401Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech enhancement as a speech front-end processing technology plays an increasingly important role in the field of artificial intelligence.At present,most of speech enhancement methods are dedicated to model the noise distribution to recover noise-deducted speech signals.However,the traditional speech enhancement methods have several drawbacks,for example,they work poorly under low signal-to-noise ratio(SNR),they have many unreasonable assumptions and so on,and it is hard to apply them to real-world scenarios.To overcome the drawbacks existing in the conventional speech enhancement methods,speech enhancement methods with deep learning are burgeoning.However,the methods mostly operate on spectrum characteristics,which seriously destroys perceptual quality of speech due to ignoring its phase information.To solve the aforementioned problems and the problem that the performance of the existing speech enhancement method based on generative adversarial network are limited to their architecture,the article presents a new convolutional neural network(CNN)composed of the basic unit of'Convolutional layer+Convolutional layer+Pooling layer'(CCP).And then,we apply the network to speech enhancement and propose an end-to-end speech enhancement method.The experimental results show that the proposed method has higher speech perceptual quality and intelligibility than the other speech enhancement methods.However,the segmental SNR(SNRseg)of the proposed method is quite low due to the effect of its objective function.Therefore,to improve the SNRseg,we propose a speech enhancement method based on Wasserstein generative adversarial network(SEWGAN),and it separates the clean speech from noisy speech through optimizing the Wasserstein distance between noisy speech distribution and clean speech distribution.In the article,multiple noise types and different SNRs are used to train the proposed method for improving its generalization capability.Experimental results show that SEWGAN's overall performance outperforms other comparative methods.As expected,the experimental results also demonstrate the proposed method has strong generalization capability in a real-world scenario.
Keywords/Search Tags:speech enhancement, generative adversarial networks, deep neural networks, Wasserstein distance
PDF Full Text Request
Related items