Research On Many-to-Many Voice Conversion Based On I-vector,Variational Auto-encoder And Generative Adversarial Networks For Non-parallel Corpora

Posted on:2020-10-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Zuo

Full Text:PDF

GTID:2428330590495415

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

The voice conversion is a technology that converts the personality characteristics of the source speaker into the target speaker while keeping the semantic information of the speech.As a highly interdisciplinary subject,voice conversion technology has been applied in terms of text to speech,communication confidentiality,multimedia application,medical assistance and language translation,and has been widely used in other fields.There are two main problems in the existing voice conversion model.First,the speaker similarity of the converted speech is not enough.Second,the quality of the converted speech is not satisfying.The thesis focuses on the voice conversion research based on the Variational Auto-Encoder and Wasserstein Generative Adversarial Network,and improves the above two problems accordingly.Firstly,in order to achieve better speaker similarity in voice conversion,the thesis applies i-vector to the voice conversion model based on VAE and WGAN,and uses the characteristic of i-vector to represent speaker personality information to improve the speaker similarity of the converted speech.The improved voice conversion model is evaluated by means of subjective and objective evaluations.The average value of MCD of the converted speech decreases by 3.22%,the average value of MOS increases by 2.63% and the average value of ABX increases by 7.35% compared with the voice conversion model based on VAE and WGAN.The results indicate that the proposed method improves the speaker similarity meanwhile improving the speech quality.Secondly,in order to achieve better speech quality in voice conversion,the thesis improves the voice conversion model based on VAE and WGAN with using the relativistic generative adversarial network with better generation performance.Relativistic generative adversarial network improves the speech quality by solving the problems of Wasserstein generative adversarial network such as difficulty in network training.The subjective and objective evaluations shows that the average value of MCD of the converted speech decreases by 4.36%,the average value of MOS increases by 4.52% and the average value of ABX increases by 3.6% compared with the voice conversion model based on VAE and WGAN.The results indicate that the proposed method improves the speaker similarity meanwhile improving the speech quality.In addition,the thesis also adds i-vector to the above method.The subjective and objective evaluation shows that the average value of MCD of the converted speech decreases by 4.8%,the average value of MOS increases by 5.12% and the average value of ABX increases by 8.6% compared with the model based on VAE and WGAN.The results indicate that this method has a great improvement in speech quality and speaker similarity.

Keywords/Search Tags:

voice conversion, variational auto-encoder, generative adversarial network, WORLD model, non-parallel corpora, i-vector, many-to-many conversion

PDF Full Text Request

Related items

1	Non-parallel Voice Conversion Using ACGAN And Variational Autoencoders Conditioned By Sentence Embedding
2	Research On Many To Many Voice Conversion Based On I-vector And Improved Variational Autoencoder For Non-parallel Corpora
3	Many-to-Many Voice Conversion Algorithm Based On Dense Net Star Generative Adversarial Network Combining I-vector For Non-parallel Corpora
4	High-quality Voice Conversion From Non-parallel Corpora Based On Variational Auto-encoder And Bottleneck Feature
5	Non-parallel Many-to-many Voice Conversion Method Based On PSR-STARGAN
6	Non-parallel Many-to-Many Voice Conversion Based On SE-ResNet Combining Speaker Embedding
7	Non-parallel Many-to-many Voice Conversion Based On Dynamic Convolution StyleGAN
8	Non-parallel Corpora Voice Conversion Based On Structured Gaussian Mixture Model Under Constraint Conditions
9	A New Lipschitz Generative Adversarial Network And Its Application In Voice Conversion
10	The Research On Voice Conversion Algorithm Based On Improved Bilinear Frequency Warping For Parallel Or Nonparallel Corpora