Research On Zero-Shot Voice Conversion With Generative Adversarial Networks

Posted on:2023-11-01

Degree:Master

Type:Thesis

Country:China

Candidate:W R Lu

Full Text:PDF

GTID:2558306830986369

Subject:Information and Communication Engineering

Abstract/Summary:

Voice is one of the most important ways of human communication.Voice conversion is an important research direction of voice synthesis.The goal of voice conversion is to make a certain voice sound like what another person said(after process it with certain algorithm)while keeping the original meaning.Voice conversion technology is widely used in various scenarios,such as voice interaction,voice customization,the entertainment industry,and so on.With the development of deep learning in the recent years,voice conversion technology has made remarkable progress.As one of the most important sub-directions in voice conversion,zero-shot voice conversion has attracted extensive attention.Although a large number of researchers have proposed corresponding algorithms of voice conversion for various scenes,most the zero-shot voice conversion technologies is still an challenging task.In recent years,zero-shot voice conversion are based on auto-encoder framework with a carefully-designed bottleneck.However,this method is not generative enough and limits to the further improvement of zeroshot voice conversion.To solve these problems,this thesis proposes a zero-shot voice conversion method based on the generation adversarial network.The main contents of research are as follows:(1)A zero-shot voice conversion framework based on generative adversarial network is proposed.For speakers who do not appear in the dataset,our algorithm uses a timbre encoder to extract timbre features from the input speech and uses a content encoder to generate content distribution features from the speech of any other speaker.Our algorithm separates timbre and content information through conversion-reconstruction cycle training,and learns to synthesize new speech.At the same time,our algorithm improves the quality of voice conversion and generalization performance with adversarial loss.Experimental results show that the proposed algorithm can achieve a higher quality of zero-shot voice conversion.(2)In the research contents(1),speech is decomposed into timbre and content,which is not accurate enough.From the perspective of acoustic,the information components in speech can be more completely decomposed into: content,timbre,rhythm and pitch.Therefore,based on the research content(1),this thesis proposes a zero-shot voice conversion framework for arbitrary components based on generative adversarial network.The four kinds of speech information are decomposed and embedded through an information encoder and sequential rescaling,and reconstructed with generator.The experimental results show that the algorithm realizes voice conversion of arbitrary component with better applicability and universality in zero-shot voice conversion.This research expands and improves the practical application of voice conversion technology.

Keywords/Search Tags:

voice conversion, zero-shot, generative adversarial network, information decomposition

Related items

1	A New Lipschitz Generative Adversarial Network And Its Application In Voice Conversion
2	Many-to-Many Voice Conversion Algorithm Based On Dense Net Star Generative Adversarial Network Combining I-vector For Non-parallel Corpora
3	Non-parallel Many-to-many Voice Conversion Method Based On PSR-STARGAN
4	StyleGAN Voice Conversion Combining DSNet And ESR Network
5	Research On Many-to-Many Voice Conversion Based On I-vector,Variational Auto-encoder And Generative Adversarial Networks For Non-parallel Corpora
6	Research On Zero-shot Learning Methods Based On Generative Adversarial Networks
7	Research And Application Of Voice Style Transfer Technology Based On Generative Adversarial Networks
8	Non-parallel Many-to-many Voice Conversion Based On Dynamic Convolution StyleGAN
9	Non-parallel Voice Conversion Using ACGAN And Variational Autoencoders Conditioned By Sentence Embedding
10	Research On Zero-shot Learning Based On Generative Model