Font Size: a A A

The Research Of Personalized Speech Synthesis Based On Generative Adversarial Network

Posted on:2022-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhuFull Text:PDF
GTID:2518306482989559Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Speech synthesis technology plays an important role in many fields such as book broadcasting,voice navigation,and intelligent soundbox.At present,the end-to-end speech synthesis system based on RNN has achieved very good synthesized speech quality,but there are problems that it is difficult to parallelize over time with consuming huge resources and high training costs.In order to improve the parallel capability of the speech synthesis system,this paper proposes a fully convolutional speech synthesis system architecture that tries to simplify the attention module in the fully convolutional architecture network and speed up speech synthesis under the premise of ensuring the quality of synthesized speech.Furthermore,in response to the public's demand for personalized speech,this paper studies personalized speech synthesis under nonparallel data.To solve the problem of insignificant characteristics of speech synthesis in existing models,a speaker-related feature from the speaker recognition field is introduced expecting a better voice conversion effect.The research content of this article mainly includes the following aspects.(1)A speech synthesis system SR-FCTTS based on a fully convolutional network is proposed to solve the problems of difficult parallel training and slow speech synthesis in the end-to-end speech synthesis system.In the synthesizer module,the expanded causal convolution is introduced to replace the RNN loop structure to obtain remote context information,and the residual module is added to prevent the network degradation problem in the deep network.The diagonal attention mechanism is introduced on the existing dot product attention mechanism to simplify the attention module of the network,reduce the training and learning cost of the attention module,and then have a positive impact on the system's synthetic speed.Finally,it is verified through experiments that the synthesis speed of the system synthesizer module has been improved.(2)A spectral super-resolution network CSRN is proposed,which is added to the fully convolutional speech synthesis system to replace the heavier vocoder,and then increasing the speech synthesis speed of the entire system.In the convolution block,the initialization function and the weight normalization function are improved to guide the model to converge in a better direction faster.In addition,binary divergence loss is introduced into the loss function to help the model converge faster.Finally,it is proved through experiments that CSRN contributes to the improvement of the synthesis effect of the entire system.(3)An improved speech conversion system XSGAN-VC based on Star Generative Adversarial Networks(Star GAN)is proposed to solve the problems of existing parallel corpus data acquisition and insufficient expression of speaker personality characteristics in speech conversion tasks.The vector x-vector representing the details of the speaker's voice is concatenated to the original one-hot vector of the speaker identity to obtain a more comprehensive expression of the speaker's personality,improving the similarity between the converted voice and the target speaker's voice.Finally,the experiment proved the effectiveness of introducing x-vector features.
Keywords/Search Tags:Speech Synthesis, Voice Conversion, Attention Mechanism, Non-parallel Personalized Speech Synthesis, Generative Adversarial Network
PDF Full Text Request
Related items