Emotional Speech Synthesis Based On Transfer Learning And Self-learning Emotional Representaion

Posted on:2020-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Zhang

Full Text:PDF

GTID:2428330572473685

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of computer science and artificial intelligence,the speech synthesis technology as the core technology of human-computer interaction has achieved good results.However,the speech synthesis technology is mainly aimed at the synthesis of neutral speech,and the emotional speech synthesis technology still needs to be improved.As an important information,emotion will greatly change the content expressed by the speech.In the absence of emotional information,it will cause ambiguity in expression and unsatisfactory communication between human and computer.This paper analyzes the emotional representation in emotional speech synthesis,proposes a self-learning emotional representation method,and proposes an emotional speech synthesis method based on self-learning emotional representation.The main research contents are as follows:1.Aiming at the problem that the existing emotional representation has insufficient description power,the difference between different people when annotate emotional speech and the excessive cost of annotation,a self-learning emotional representation method is proposed,which use a self-coding neural network for modeling emotion in the speech.Adversarial training is used to ensure that emotion is speaker-independent.The experimental results show that the self-learning emotional representation has good performance without manual participation,and solves the problem of excessive cost and individual labeling difference.2.An emotional speech synthesis method based on transfer learning and self-learning emotional representation is proposed.The method uses the speaker discriminant model in the text-independent speaker verification for extracting the speaker's characteristics in emotional speech synthesis.Then the speaker's characteristics,self-learning emotional representation and text are fed into the end-to-end emotional speech synthesizer,and the Mel-spectrogram is obtained.Finally,the Mel-spectrogram is converted to emotional speech by WaveNet vocoder.The method does not require emotional annotation information and speaker label information during training,and is more flexible than other emotional speech synthesis methods.Finally,the experimental results show that the emotional speech synthesis method can synthesize speech with high naturalness and high sentiment with only a small amount of target speaker reference speech.

Keywords/Search Tags:

emotional speech synthesis, emotion modeling, self-learning emotion representation, adversarial training, transfer learning

PDF Full Text Request

Related items

1	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
2	Research On Key Techniques Of Speech Emotion Recognition
3	A Study On Representation Learning Based Acoustic Modeling For Speech Synthesis
4	The Modeling Research For Speech Emotion Towards Expressive Speech Synthesis
5	Research On Speech Emotion Recognition Technology
6	Research On Emotional Speech Synthesis And System Building
7	Research On Speech Emotion Recognition Based On Emotional Feature Enhancement
8	Research And Application Of Online Learning Community Learners' Emotion Recognition Based On Transfer Learning
9	Design And Implementation Of On - Line Emotional Speech Recognition System
10	Research On Speech Emotion Recognition Methods Based On Deep Learning And Transfer Learning