Font Size: a A A

Research On Emotional Speech Synthesis Based On Deep Neural Network

Posted on:2019-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:P P ZhiFull Text:PDF
GTID:2428330545483979Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The quality of speech synthesis has made a remarkable improvement as a result of the development of computer and signal processing technology.However,most studies of speech synthesis currently focus on synthetic neutral speech and lack of research on emotional speech synthesis.Human language contains rich emotional information that cannot be fully conveyed by neutral text or speech.With the popularity of deep learning and artificial intelligence,emotional human-computer communication is gradually becoming a demand.The study of emotional speech synthesis has become more and more important.In this thesis,we had established a multi-speaker sentiment corpus with 11 typical emotions and introduced deep neural network(DNN)to achieve DNN-based emotional speech synthesis and DNNbased speaker adaptive emotional speech synthesis.On this basis,We introduced the PAD emotion model,then implemented the PAD emotion correction and used DNN to synthesize emotional speech.The main works and originalities of the thesis are as follows:Firstly,the thesis was established a typical emotional corpus of emotional speakers.With the help of a professional recording device and emotion-induced methods,11 emotions(relaxation,surprise,meekness,joy,anger,anxiety,disgust,guilt,fear,sadness,etc.)speeches from 9 female speakers were obtained,and were preserved as 16 KHz single-channel 16 bit.We recorded 300 emotional speeches for each emotion,which laid the foundation for further emotional speech synthesis experiments.Secondly,the thesis was proposed a DNN-based emotional speech synthesis and a DNNbased speaker adaptive emotional speech synthesis method.By selecting the structure of the DNN,we established a DNN-based emotional speech synthesis model,and then use a deep learning method to achieve DNN-based emotional speech synthesis.On this basis,we used multiple emotional speeches of multiple speakers to train the DNN average voice model(AVM),then used the speeches of the target emotional speaker to perform speaker adaptation.Finally,we obtained the target emotional voice model on the average voice model to achieve the synthesis of target emotional speech.The evaluation results show that the proposed DNN-based speaker-adaptive emotional speech synthesis method synthesizes emotional speech better than other methods.Thirdly,the thesis was proposed an emotional speech synthesis method based on PAD three-dimensional emotion model to achieve continuous emotion.We used the PAD model to emotionally annotate the corpus of emotional and determined the emotional state of the corpus by calculating the distance from the marked point to the known emotional point to obtain the emotional PAD parameter.Then we used the DNN-based emotional speech synthesis method to generate speech parameters and maped the synthesized emotion speech parameters to the PAD model.Through the calculation of the distance between the mapping point and the known emotions,we modified the emotion parameters and finally synthesized the target emotional speech.Experimental results show that the emotional speech synthesized by this method has higher emotional preference and subjective score than other methods.
Keywords/Search Tags:emotional speech synthesis, deep neural network, speaker adaptive, statistical parameter speech synthesis, Hidden Markov Model, PAD emotion model
PDF Full Text Request
Related items