Font Size: a A A

Non-parallel Many-to-many Voice Conversion Based On Dynamic Convolution StyleGAN

Posted on:2022-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:C F ZhangFull Text:PDF
GTID:2518306557469614Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Voice Conversion(VC)is a kind of intelligent speech technology that converts the voice personality information of source speaker to the target without altering the linguistic content of source speaker.In recent years,with the development of deep learning and artificial intelligence,voice conversion technology has been solved in non-parallel and many-to-many conversion to some extent.At present,how to further improve the quality and personality similarity of converted voice in the case of non-parallel and many-to-many VC has become the key of the technology from laboratory to industry.Therefore,this thesis uses the Star GAN-VC model as the baseline,makes relevant research and improvement on the quality and personality similarity of converted voice.First,from the view of improving the personality similarity of converted voice,this thesis proposes the Style GAN-VC model.A well-designed Multi-Layer Perceptron(MLP)and Style Encoder(SE)are combined with Star GAN-VC model for joint optimization training to extract speaker style features,which makes up for the shortcoming that the one-hot vector in the baseline Star GAN-VC model fails to provide sufficient speaker personality information.At the same time,Adaptive Instance Normalization(Ada IN)method is used to integrate speaker style features with semantic features,so that the generator can learn more target speaker personality information,realize the converted of speaker style,and improve the personality similarity of the converted voice.The subjective and objective evaluation shows that the Style GAN-VC model proposed improve the performance by an average of 5.03% on MCD values,31.70% on MOS values and 11.08% on ABX values compared to the baseline Star GAN-VC model.It is verified that the Style GAN-VC model proposed in this thesis can not only enhance the personality similarity of converted voice,but also provide significant improvement in the quality.Further,on the basis of the above improved model,from the view of improving the quality of converted voice,this thesis proposes Style GAN-VC model based on Dynamic Convolution(DyConv).Dynamic convolution mechanism is used to improve the generation and expression ability of the generator,rather than conventional strategies such as increasing the depth or width of the network.The dynamic convolutional network is different from the traditional convolutional network,it has extremely strong data dependence,can according to the different input data characteristics,dynamically adjust the parameters of each convolutional kernels,and assign different weights to each convolution kernel by using the attention mechanism.Finally,several convolution kernels with different weights are assembled into a dynamic convolution kernel,and then the corresponding convolution operation is performed.Thus,dynamic convolution with extreme data dependence can significantly enhance the generation and expression ability of the generator to improve the quality of converted voice.The subjective and objective evaluation shows that the Style GAN-VC model based on dynamic convolution proposed improve the performance by an average of 8.44% on MCD values,38.11% on MOS values and 13.78% on ABX values compared to the baseline Star GAN-VC model.It is verified that the Style GAN-VC model based on dynamic convolution can significantly improve the quality of the converted voice,and also improve the personality similarity.
Keywords/Search Tags:Voice Conversion, non-parallel, many-to-many, Star Generative Adversarial Network, speaker style, Adaptive Instance Normalization, Dynamic Convolutional
PDF Full Text Request
Related items