Font Size: a A A

Research On End-to-end Multi-speech Separation Technology Based On Generative Adversarial Nets

Posted on:2019-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:D D XuFull Text:PDF
GTID:2428330548995921Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the age of artificial intelligence,more and more smart devices are entering people's lives.Voice,as an important interface for human-computer interaction,brings great convenience to life.Therefore,many scholars have introduced related technologies for speech processing into various fields.However,current technologies such as speech recognition are all based on experimental environments,and no effective results are often obtained in noisy practical applications.Therefore,speech separation technologies that can remove background noise or other unrelated people's speech interference have very large application space.As a newly emerging deep learning network,generative adversarial nets has added a discriminative model based on the original single generation model.At present,it has a good performance in the field of image generation,but it has not yet been developed on the issue of speech separation,so this article for the first time will be generative adversarial nets applied to speech separation problem.At the same time,the current speech separation technology is generally based on pre-extracted audio features as the network input,ignoring the loss of high-frequency speech and related information during the extraction of features,and introducing false information in the conversion process.Voice separation performance affects.Therefore,this paper adopts a generative countermeasure network,takes the waveform of the original speech signal as input,realizes an end-to-end speech separation model,and improves the network performance from the following aspects on the basis of the original network.1.For traditional acoustic feature extraction methods,fourier transform,discrete cosine transform,etc are used to cause the loss of high-frequency signal and correlation information of the signal.This paper selects the original waveform of the voice signal as the input to generative adversarial nets,eliminating other methods.It is necessary to extract complex features as energy loss caused by inputting and extracting features.The deeper acoustic features of the speech signal can be extracted and experimental verification is given.2.An end-to-end multi-voice separation framework based on generative adversarial nets is proposed.Taking the generation confrontation network that has made new breakthroughs in image generation as a prototype,adopting a deep convolution countermeasures generation network DCGAN to improve the network stability,the network structure of the full convolution enhances the close correlation in time,reduces training parameters and shortens training.time.At the same time,the network structure of generator and discriminator is combined with the characteristics of self-coding network,LSGAN,WGAN and other network models.According to the speech separation problem,the deficiency of the original network is compensated,and the experimental results are set up to build the most effective model.The structure further enhances the separation effect.3.Combining multi-voice separation problem,this paper adopts a mask loop to establish the mutual information between hidden layer variables and multiple separation targets,and at the same time,to expand the data volume,it is based on the relationship between generation network model and discriminative modularity.The discriminator discriminates the true data back propagation as the tag of the generation model,realizes the data expansion on the one hand to make full use of the data to improve the training effect,and on the other hand also solves the imbalance problem existing in the generative adversarial nets.This study found that adding a discriminant model to the original speech separation model can improve the performance of the generated model,and the generative adversarial nets in the image generation field has also achieved good results in the field of speech separation.Taking the original waveform of the speech signal as an input to generative adversarial nets,is advantageous for the network to extract the deeper features of the speech signal.The generation of successful applications of the anti-network in speech separation issues also provides new ideas for speech signal processing.
Keywords/Search Tags:Generative adversarial nets, Speech primitive waveform, End-to-end model, Multi-voice separation, Convolutional neural network
PDF Full Text Request
Related items