Font Size: a A A

Voice Conversion Based On CycleGAN Network Under Non-parallel Corpus

Posted on:2019-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:T LiFull Text:PDF
GTID:2428330566484942Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Voice conversion technology refers to changing the personality characteristics of thesource speaker without changing the semantic information so that the voice sounds like the target speaker.Voice conversion has a high theoretical research value and a wide range of application scenarios.Most of the current research on speech conversion depends on parallel corpus,but in practice,parallel corpus is often difficult to obtain,and the alignment of features is prone to error.It is also incapable of translating different languages.This paper focuses on more flexible and universal voice conversion under non parallel data conditions.The main work is as follows:(1)The theory and flow of voice conversion are sorted out,and the latest WORLD voice signal analysis and synthesis model is used to do feature parameter extraction and speech synthesis.(2)Applying the CycleGan network,which performs well in image style migration under non-parallel data conditions,to the spectrum conversion process of speech conversion,and specifically improving the generator network,discriminator network,loss function,experimental details and hyperparameters,the results show that basic speech conversion can be achieved,but the results need to be improved.(3)Continued to make the above network,the network structure is changed to solve the problem of difficult training and instability of the discriminator network,and the GLU activation function is added to ensure the order and stratified features.The results show that the improved Cycle GAN+GLU method is very close to the GMM method based on parallel data conditions.
Keywords/Search Tags:Voice conversion, Non-parallel corpus, CycleGan, GLU
PDF Full Text Request
Related items