Font Size: a A A

Improvement Of Prosodic Structure Prediction In Speech Synthesis

Posted on:2018-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:T H WangFull Text:PDF
GTID:2348330512993159Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The prosodic structure is one of the key factors that affect the naturalness of speech in speech synthesis.The study of prosodic structure prediction becomes more and more important.The traditional prosody prediction modeling methods have achieved many successes in application,but it uses superficial information such as Part-of-Speech information when selecting input features,it ignores the influence of the deep semantic and grammatical information on prosodic structure.In addition,when the data complexity is very large,there will appear many problems such as narrow scope of application,over-fitting and over reliance on rules.Aiming at the limitations of traditional methods,we need a model with strong modeling capabilities for complex data,and the input of the model needs to represent deep information.In this thesis,we introduce the deep neural network prediction model based on the word embedding as the input feature in the prosodic structure prediction module.The main work of this thesis is as follows:(1)Using the trained word embedding instead of the traditional POS information as the input of the prediction model,adding the length information and the punctuation information into the input feature of the model,improving the learning effect of the model;(2)Modeling the prosodic prediction model with the network structure of the stacking feed-forward and bidirectional long short-term memory recurrent network layers,comparing the results of the prosodic prediction model under different network structures,and finding a better network structure to predict prosodic structure;(3)In order to further improve the prediction accuracy of the prosodic structure prediction based on the depth learning,after the network model,we use the output score of the network model and the transfer score between the prosodic structure categories to dynamically plan the output sequence of the prosodic level category labels.
Keywords/Search Tags:Speech synthesis, Prosodic structure prediction, Deep learning, Word2vec, Deep Neural Network
PDF Full Text Request
Related items