Research On Prosodic Structure Prediction Based On Deep Neural Network

Posted on:2017-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:Q Wang

Full Text:PDF

GTID:2308330482479277

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Chinese prosodic prediction plays an important role in the naturalness of synthetic speech. The goal of this paper is to improve the prediction accuracy of the prosodic structure. Based on the previous statistical prosodic prediction models, researchers need do a lot of work in feature engineering. Because of the lack of correlation between the words, words often form "lexical gap" phenomenon, resulting in even two synonyms cannot show the correlation. Therefore, we need to use representations which could reflect the relationship between words and use them as the input features of the model. Hence, this paper uses deep neural network model as the prosodic prediction model.In this paper, firstly we use Gensim to train lexical word embeddings, then we learn the prosodic word embeddings by constructing the lexical word embeddings together; Secondly, the traditional neural network model was improved in the hidden layer to better capture the word-word interaction. The main work is as follows:(1) Using Gensim to train the word embeddings for lexical words, using lexical word embeddings to learn prosodic word embeddings, and using different levels of word embeddings to grab the prosodic structure information in the context;(2) Training the neural network model by labeled data, using the lexical word embeddings, prosodic word embeddings, tag embeddings and length embeddings as he input features to improve the prediction ability of the model;(3) Adding tensor to the hidden layer to improve the ability of model. The tensor matrix captures the word-word interaction and the interaction between different prosodic levels.The results of experiments show that compound input features are better than single input feature, with the ER(error rate) of prosodic words decreasing by 3.2%(from 15.3% to 12.1%), the ER of prosodic phrases decreasing by 5%(from 40.3%to 35.3%); After adding tensor to hidden layer, the ER of prosodic words decreasing by 0.5%(from 12.1% to 11.6%). The results show that compound input features could improve the ER of prosodic prediction; Compared to the traditional hidden layer, hidden layer with tensor could capture more information in different prosodic levels.

Keywords/Search Tags:

Speech synthesis, Prosodic structure prediction, Word embedding, Deep Neural Network

PDF Full Text Request

Related items

1	Improvement Of Prosodic Structure Prediction In Speech Synthesis
2	Research And Implementation Of Chinese Prosodic Structure Prediction Model
3	Research On Automatic Labeling Of Speech Synthesis Corpora
4	The Research Of Prosodic Control Algorithm And Realization For Chinese Speech Synthesis
5	The Method And Implementation Of ToBI Automatic Prosodic Labeling In English Text To Speech System
6	Research On Neural Network Based Statistical Parametric Speech Synthesis
7	Research On Chinese Speech Transcription Punctuation Prediction Based On Deep Learning
8	Chinese Speech Synthesis System Improvements And Implementation
9	HMM-based Mandarin Speech Synthesis And Prosodic Optimized
10	Research On Neural Network-based Acoustic Modeling For Speech Synthesis