The Research Of Personalized Speech Synthesis Based On Generative Adversarial Network

Posted on:2022-06-11

Degree:Master

Type:Thesis

Country:China

Candidate:M Zhu

Full Text:PDF

GTID:2518306482989559

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Speech synthesis technology plays an important role in many fields such as book broadcasting,voice navigation,and intelligent soundbox.At present,the end-to-end speech synthesis system based on RNN has achieved very good synthesized speech quality,but there are problems that it is difficult to parallelize over time with consuming huge resources and high training costs.In order to improve the parallel capability of the speech synthesis system,this paper proposes a fully convolutional speech synthesis system architecture that tries to simplify the attention module in the fully convolutional architecture network and speed up speech synthesis under the premise of ensuring the quality of synthesized speech.Furthermore,in response to the public's demand for personalized speech,this paper studies personalized speech synthesis under nonparallel data.To solve the problem of insignificant characteristics of speech synthesis in existing models,a speaker-related feature from the speaker recognition field is introduced expecting a better voice conversion effect.The research content of this article mainly includes the following aspects.(1)A speech synthesis system SR-FCTTS based on a fully convolutional network is proposed to solve the problems of difficult parallel training and slow speech synthesis in the end-to-end speech synthesis system.In the synthesizer module,the expanded causal convolution is introduced to replace the RNN loop structure to obtain remote context information,and the residual module is added to prevent the network degradation problem in the deep network.The diagonal attention mechanism is introduced on the existing dot product attention mechanism to simplify the attention module of the network,reduce the training and learning cost of the attention module,and then have a positive impact on the system's synthetic speed.Finally,it is verified through experiments that the synthesis speed of the system synthesizer module has been improved.(2)A spectral super-resolution network CSRN is proposed,which is added to the fully convolutional speech synthesis system to replace the heavier vocoder,and then increasing the speech synthesis speed of the entire system.In the convolution block,the initialization function and the weight normalization function are improved to guide the model to converge in a better direction faster.In addition,binary divergence loss is introduced into the loss function to help the model converge faster.Finally,it is proved through experiments that CSRN contributes to the improvement of the synthesis effect of the entire system.(3)An improved speech conversion system XSGAN-VC based on Star Generative Adversarial Networks(Star GAN)is proposed to solve the problems of existing parallel corpus data acquisition and insufficient expression of speaker personality characteristics in speech conversion tasks.The vector x-vector representing the details of the speaker's voice is concatenated to the original one-hot vector of the speaker identity to obtain a more comprehensive expression of the speaker's personality,improving the similarity between the converted voice and the target speaker's voice.Finally,the experiment proved the effectiveness of introducing x-vector features.

Keywords/Search Tags:

Speech Synthesis, Voice Conversion, Attention Mechanism, Non-parallel Personalized Speech Synthesis, Generative Adversarial Network

PDF Full Text Request

Related items

1	Research On Detection Algorithm Of Speech Spoofing And Its System Implementation
2	Research Of Personalized Speech Generation
3	Research Chinese Speech Based On Speech Recognition And Speech Synthesis Conversion
4	Research And Implementation Of Speech Synthesis Method For Helping Old Robots
5	Improved Tacotron2 Speech Synthesis Method Based On Forced Monotonic Attention Mechanism
6	Research On Embedded Speech Synthesis Technology
7	Research On Algorithms Of Speech Synthesis Based On Deep Neural Network
8	Research On Emotional Speech Synthesis Based On Generative Adversarial Networks
9	Research On Multi-emotional Speech Synthesis Technology Based On Short-term Specific Human Voice
10	Research And Implementation Of End-to-End Prosodic Speech Synthesis System