Font Size: a A A

Research On Multi-discrimination Singing Voice Synthesis Vocoder Based On Generative Adversarial Network

Posted on:2022-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:F Y ChenFull Text:PDF
GTID:2518306551953469Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Singing voice synthesis vocoder(SVSV)is an algorithm that transforms the acoustic characteristics of Mel spectrum and fundamental frequency into singing sound waves.It is the technical extension of speech synthesis vocoder in the field of singing and an indispensable part of SVS.In recent years,deep learning algorithms have gradually been applied in speech scenarios,and various neural network models of vocoders have been proposed,which has enabled the rapid development of the field of speech synthesis.However,there are relatively few studies on vocoders based on SVS,and the existing vocoder technology still has poor sound quality or lower real-time rate.These problems have severely restricted the development and practical industrial application of SVS.Using the multi-speaker singing voice data set,this thesis proposes a multidiscrimination SVSV based on GAN.The research mainly includes three aspects:First,study the effect of MelGAN and Parallel WaveGAN on the multi-speaker singing data set,find problems and try to improve;then,design a SVSV based on GAN,the SVSV input introduces the excitation source constructed by the fundamental frequency,and the multi-window multi-band discriminator is designed and constructed.Finally,we propose an adaptive feature learning module(AFL)to accelerate the synthesis of the vocoder and to improve the real-time rate.Through the above research,the singing effect MOS score synthesized on the multi-speaker singing data set reached 4.0,and the single-core running test was performed on a 2.6GHz CPU,to achieve real-time rate RTF<0.7,which basically meets the requirements of industrial applications.Commercial applications have been carried out.
Keywords/Search Tags:Singing voice synthesis, vocoder, GAN, Multi-window multi-band discriminator, dilation convolutional layer
PDF Full Text Request
Related items