Research On Many-to-Many Voice Conversion Based On Multi-Scale StarGAN By Share-Learning For Non-parallel Corpora

Posted on:2021-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:H Sha

Full Text:PDF

GTID:2428330614465879

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

The goal of voice conversion is to convert the source speaker's voice so that it sounds like the voice of the target speaker,and the semantics remain unchanged.Voice conversion can be divided into voice conversion of parallel corpus and non-parallel corpus.The difference lies in whether the voice content and duration of the source speaker and the target speaker in the corpus used for training are the same.However,in practical application,it is very difficult to obtain a large amount of parallel corpus,and in some cases it is not achievable,so it is very necessary to study the voice conversion of non-parallel corpus.The performance evaluation of voice conversion mainly includes two aspects: the quality and similarity of converted voice.Existing non-parallel speech conversion models have difficulty in achieving good performance in both aspects.This paper focuses on the StarGAN voice conversion model,and proposes a series of improvements in the above two aspects.First,in order to improve the sound quality of the converted voice and make it sound more realistic and delicate,this paper uses the Multi-Scale structure to improve the baseline system,and proposes a voice conversion method based on Multi-Scale StarGAN to extract different levels of the target speaker 's global features.Multi-scale features enhance the details of the converted voice.Through subjective and objective experiments,it is verified that the performance of the time-domain waveform of the converted voice based on improved voice conversion model is smoother,which is closer to the voice of target speaker,,and the spectrogram is also clearer,and the average MOS is increased by 21.8%,the average ABX is increased by 5.56% compared with the StarGAN-based voice conversion model.The results show that this method can effectively improve the synthesized voice quality while improving the voice similarity.Secondly,considering that StarGAN trains the generator to realize voice conversion by training the discriminator and classifier,so by using Share-Learning strategy to train the the shared module of discriminator and classifier which is named Share-Block,so that we can improve the performance of the discriminator and classifier,improving the stability of training,accelerating the convergence speed and improving the sound quality and similarity of synthesized voice.Efficient subjective and objective comparisons show that,,the average MOS is increased by 15.79%,and the average ABX is increased by 2.38% compared with the StarGAN-based voice conversion model.Furthermore,combining the two innovations in this paper,Share-Learning is added to the Multi-Scale StarGAN method,and a voice conversion method based on Multi-Scale StarGAN using Share Learning is proposed.Subjective and objective evaluation shows that compared with the converted voice by the Multi-Scale StarGAN method,the time-domain waveform of the converted voice is smoother and closer to the voice of target speaker.The spectrogram of the converted voice is clearer.The average MOS is increased by 3.57% and the average ABX value is increased by 3.30%,indicating that this method has greatly improved the voice quality and the speaker's personality similarity.Compared with the voice conversion model based on StarGAN,the average MOS is increased by 28.95%,and the average ABX is increased by 9.03%.Full experimental results show that this method improves voice quality while improving voice similarity effectively.

Keywords/Search Tags:

Voice Conversion, GAN, StarGAN, Multi-Scale, Share-learning, Non-parallel Corpora

PDF Full Text Request

Related items

1	Non-parallel Many-to-Many Voice Conversion Based On SE-ResNet Combining Speaker Embedding
2	Non-Parallel Many-to-many Voice Conversion Method Based On Adaptive Trans-StarGAN
3	Non-parallel Corpora Voice Conversion Based On Structured Gaussian Mixture Model Under Constraint Conditions
4	A Study Of Voice Conversion System Based On GAN
5	Research On Many To Many Voice Conversion Based On I-vector And Improved Variational Autoencoder For Non-parallel Corpora
6	Non-parallel Many-to-many Voice Conversion Method Based On PSR-STARGAN
7	Research On Many-to-Many Voice Conversion Based On I-vector,Variational Auto-encoder And Generative Adversarial Networks For Non-parallel Corpora
8	The Research On Voice Conversion Algorithm Based On Improved Bilinear Frequency Warping For Parallel Or Nonparallel Corpora
9	Non-parallel Voice Conversion Using ACGAN And Variational Autoencoders Conditioned By Sentence Embedding
10	Voice Conversion Using Structured Gaussian Mixture Model In Eigen Space