Research On End-to-end Multi-speech Separation Technology Based On Generative Adversarial Nets

Posted on:2019-01-11

Degree:Master

Type:Thesis

Country:China

Candidate:D D Xu

Full Text:PDF

GTID:2428330548995921

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the age of artificial intelligence,more and more smart devices are entering people's lives.Voice,as an important interface for human-computer interaction,brings great convenience to life.Therefore,many scholars have introduced related technologies for speech processing into various fields.However,current technologies such as speech recognition are all based on experimental environments,and no effective results are often obtained in noisy practical applications.Therefore,speech separation technologies that can remove background noise or other unrelated people's speech interference have very large application space.As a newly emerging deep learning network,generative adversarial nets has added a discriminative model based on the original single generation model.At present,it has a good performance in the field of image generation,but it has not yet been developed on the issue of speech separation,so this article for the first time will be generative adversarial nets applied to speech separation problem.At the same time,the current speech separation technology is generally based on pre-extracted audio features as the network input,ignoring the loss of high-frequency speech and related information during the extraction of features,and introducing false information in the conversion process.Voice separation performance affects.Therefore,this paper adopts a generative countermeasure network,takes the waveform of the original speech signal as input,realizes an end-to-end speech separation model,and improves the network performance from the following aspects on the basis of the original network.1.For traditional acoustic feature extraction methods,fourier transform,discrete cosine transform,etc are used to cause the loss of high-frequency signal and correlation information of the signal.This paper selects the original waveform of the voice signal as the input to generative adversarial nets,eliminating other methods.It is necessary to extract complex features as energy loss caused by inputting and extracting features.The deeper acoustic features of the speech signal can be extracted and experimental verification is given.2.An end-to-end multi-voice separation framework based on generative adversarial nets is proposed.Taking the generation confrontation network that has made new breakthroughs in image generation as a prototype,adopting a deep convolution countermeasures generation network DCGAN to improve the network stability,the network structure of the full convolution enhances the close correlation in time,reduces training parameters and shortens training.time.At the same time,the network structure of generator and discriminator is combined with the characteristics of self-coding network,LSGAN,WGAN and other network models.According to the speech separation problem,the deficiency of the original network is compensated,and the experimental results are set up to build the most effective model.The structure further enhances the separation effect.3.Combining multi-voice separation problem,this paper adopts a mask loop to establish the mutual information between hidden layer variables and multiple separation targets,and at the same time,to expand the data volume,it is based on the relationship between generation network model and discriminative modularity.The discriminator discriminates the true data back propagation as the tag of the generation model,realizes the data expansion on the one hand to make full use of the data to improve the training effect,and on the other hand also solves the imbalance problem existing in the generative adversarial nets.This study found that adding a discriminant model to the original speech separation model can improve the performance of the generated model,and the generative adversarial nets in the image generation field has also achieved good results in the field of speech separation.Taking the original waveform of the speech signal as an input to generative adversarial nets,is advantageous for the network to extract the deeper features of the speech signal.The generation of successful applications of the anti-network in speech separation issues also provides new ideas for speech signal processing.

Keywords/Search Tags:

Generative adversarial nets, Speech primitive waveform, End-to-end model, Multi-voice separation, Convolutional neural network

PDF Full Text Request

Related items

1	Research On Speech Enhancement Method Based On Generative Adversarial Networks
2	Research On Image Super-resolution Using Nested Connections And Generative Adversarial Nets
3	Joint Training Algorithms For Generative Adversarial Networks And Their Application To Speech Separation
4	Research Of Generating Text Via Generative Adversarial Nets
5	The Research On Neural Machine Translation Based On Generative Adversarial Nets
6	Research On Facial Expression Analysis Based On Conditional Generative Adversarial Nets
7	Research On Auto-encoders And Generative Adversarial Network Based Speech Enhancement
8	Font Style Transfer Algorithms Based On Generative Adversarial Nets
9	The Research Of Personalized Speech Synthesis Based On Generative Adversarial Network
10	Signal Reconstruction Based On Generative Adversarial Networks