Research On Emotional Speech Synthesis Based On Generative Adversarial Networks

Posted on:2021-04-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Shao

Full Text:PDF

GTID:2518306104986509

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

The development of deep learning technology has brought benefit to all walks of life.Especially in speech synthesis technology,deep learning has achieved great success.The end-to-end speech synthesis technology led by Tacotron not only makes it easier to build speech synthesis system,but also makes the synthesized speech more understandable and natural.Today,the voice has gradually entered our life.All kinds of voice assistant,voice interaction function make our life more convenient.At present,the speech synthesis technology,which is still in the stage of producing human understandable voice,has a bottleneck,and it is still unable to express the expression of emotion and deliver vivid speech like human beings.This is the key witch prevent the speech synthesis system from being more widely used.At the same time,because of the development of end-to-end speech synthesis system,so the research on emotion is just beginning.This is a hot topic in the field of speech synthesis.Since it was put forward,the Generative Adversarial Network has been paid much attention to and caused waves in the field of computer vision.The Generative Adversarial Network has many applications,including generating fake photos,changing the style of images,and so on.Up to now,Generative Adversarial Network is still one of the hottest research directions of generation model.Unlike the popularity of Generative Adversarial Network in computer vision,few people use Generative Adversarial Network for speech synthesis.Inspired by the success of the Generative Adversarial Network in the field of image style conversion,this paper combines the Generative Adversarial Network with Tacotron2 to construct a new emotional speech synthesis system.The system uses text and prosodic features as input to synthesize emotional speech.The emotional speech synthesis system mainly consists of speech synthesis module and prosody extraction module.The speech synthesis module is a Tacotron2 model.Prosodic extraction module extracts prosodic features from a speech as input to Tacotron2.In this paper,the prosodic features are screened by traditional machine learning methods to ensure that the extracted prosodic features have a high correlation with emotion and a small collinearity between features.Finally,this paper trains the model with the idea of Conditional Generative Adversarial Network.The discriminator is responsible for the emotion constraint of the generated speech,and the generator is responsible for the sound fitting.The result is an emotional speech synthesis system that can be derived by modifying the prosodic characteristics of the input and by controlling the emotion of the output speech.The model is evaluated in intelligibility and naturalness.The intelligibility was evaluated by the speech recognition system's word error rate and the subjective MOS score.The results show that the model word error rate and MOS score in this paper far exceeded Tacotron2 and were equal to GSTTacotron2.In terms of naturalness,we evaluated the model by using The Mel Cepstrum Error and the F0 Frame Error.The results show that the F0 Frame Error of our model is 15% lower than that of GSTTacotron2,and the Mel cepstrum error is the same as that of GSTTacotron2,which proves that the model proposed in this paper is superior to GSTTacotron2 in the expression of naturalness.

Keywords/Search Tags:

Emotional Speech Systhesis System, Tacotron2, Generative Adversarial Network

PDF Full Text Request

Related items

1	Research On Single-Channel Speech Enhancement Based On Generative Adversarial Network
2	Research On Auto-encoders And Generative Adversarial Network Based Speech Enhancement
3	Speech Enhancement Algorithm Based On Generative Adversarial Network
4	Research On Speech Denoising Algorithm Based On Generative Adversarial Network
5	Development And Application Of Dialect Speech Synthesis System Based On Tacotron2
6	Research On Speech Enhancement Method Based On Generative Adversarial Networks
7	An End-to-end Bone-conducted Speech Enhancement Method Based On Generative Adversarial Networks
8	The Research Of Personalized Speech Synthesis Based On Generative Adversarial Network
9	Joint Training Algorithms For Generative Adversarial Networks And Their Application To Speech Separation
10	Research On Speech Dereverberation Based On Improved Wasserstein Generative Adversarial Networks