Font Size: a A A

Research On Many-to-Many Voice Conversion Based On I-vector,Variational Auto-encoder And Generative Adversarial Networks For Non-parallel Corpora

Posted on:2020-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ZuoFull Text:PDF
GTID:2428330590495415Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The voice conversion is a technology that converts the personality characteristics of the source speaker into the target speaker while keeping the semantic information of the speech.As a highly interdisciplinary subject,voice conversion technology has been applied in terms of text to speech,communication confidentiality,multimedia application,medical assistance and language translation,and has been widely used in other fields.There are two main problems in the existing voice conversion model.First,the speaker similarity of the converted speech is not enough.Second,the quality of the converted speech is not satisfying.The thesis focuses on the voice conversion research based on the Variational Auto-Encoder and Wasserstein Generative Adversarial Network,and improves the above two problems accordingly.Firstly,in order to achieve better speaker similarity in voice conversion,the thesis applies i-vector to the voice conversion model based on VAE and WGAN,and uses the characteristic of i-vector to represent speaker personality information to improve the speaker similarity of the converted speech.The improved voice conversion model is evaluated by means of subjective and objective evaluations.The average value of MCD of the converted speech decreases by 3.22%,the average value of MOS increases by 2.63% and the average value of ABX increases by 7.35% compared with the voice conversion model based on VAE and WGAN.The results indicate that the proposed method improves the speaker similarity meanwhile improving the speech quality.Secondly,in order to achieve better speech quality in voice conversion,the thesis improves the voice conversion model based on VAE and WGAN with using the relativistic generative adversarial network with better generation performance.Relativistic generative adversarial network improves the speech quality by solving the problems of Wasserstein generative adversarial network such as difficulty in network training.The subjective and objective evaluations shows that the average value of MCD of the converted speech decreases by 4.36%,the average value of MOS increases by 4.52% and the average value of ABX increases by 3.6% compared with the voice conversion model based on VAE and WGAN.The results indicate that the proposed method improves the speaker similarity meanwhile improving the speech quality.In addition,the thesis also adds i-vector to the above method.The subjective and objective evaluation shows that the average value of MCD of the converted speech decreases by 4.8%,the average value of MOS increases by 5.12% and the average value of ABX increases by 8.6% compared with the model based on VAE and WGAN.The results indicate that this method has a great improvement in speech quality and speaker similarity.
Keywords/Search Tags:voice conversion, variational auto-encoder, generative adversarial network, WORLD model, non-parallel corpora, i-vector, many-to-many conversion
PDF Full Text Request
Related items