Study On Speech Synthesis Based On Deep Neural Network

Posted on:2019-12-14

Degree:Master

Type:Thesis

Country:China

Candidate:J T Zhang

Full Text:PDF

GTID:2428330566984956

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In the DNN-based speech synthesis algorithm,the DNN model establishes a mapping from linguistic feature to acoustic feature.Due to the limitation of the DNN model,it is difficult to reflect the overall characteristics of the entire utterance.In order to generate a smooth speech parameter trajectory,the acoustic features used include not only static feature but also dynamic feature.Then a speech parameter generation algorithm can be used to generate a smooth speech parameter trajectory.Because this method requires statistics at all frames in an utterance,there is a high latency when synthesizing speech.In order to solve the above problems,the speech synthesis algorithms based on deep neural network are studied in the thesis,and the main contributions are as follows:(1)A speech synthesis algorithm based on DNN considering GV is proposed.In the training stage,for the state duration model,the input is linguistic feature and the output is state duration feature;for the GV model,the input is utterance-level linguistic feature and the output is GV feature;for the acoustic model,the input is the GV-combined linguistic feature and the output is acoustic feature,and the regression models are all DNN ones.In the synthesizing stage,the constructed linguistic feature is first input into the state duration model and the GV model to generate the state duration feature and the GV feature.Then the encoded linguistic feature is up-sampled according to the state duration and the GV-combined linguistic feature is generated according to the GV.The GV-combined linguistic feature is then input into the acoustic model,generating acoustic feature.Finally,synthesized speech is obtained by sending the acoustic feature into the vocoder.(2)A low-latency speech synthesis method based on LSTM is improved.In order to implement a low-latency speech synthesis algorithm,the acoustic feature used only includes static feature.To generate a smooth speech parameter trajectory,the LSTM-based recurrent output layer is used to obtain a trainable speech parameter smoother for output feature.In the synthesizing stage,duration prediction,acoustic feature prediction,and vocoding are executed in a streaming manner,enabling low-latency speech synthesis.

Keywords/Search Tags:

Speech Synthesis, Deep Neural Network, Acoustic Model, Global Variance, Low Latency

PDF Full Text Request

Related items

1	Research On Neural Network-based Acoustic Modeling For Speech Synthesis
2	Research On Emotional Speech Synthesis Based On Deep Neural Network
3	Research On Neural Network Based Statistical Parametric Speech Synthesis
4	A Study On Speech Synthesis And Visual Speech Synthesis Based On Neural Networks
5	Research On Unit Selection Concatenation Speech Synthesis Method Based On Deep Learning
6	Research On Deep Learning Based Small-Sized Unit Concatenation Speech Synthesis
7	The Research Of Uyghur Acoustic Model Based On Deep Neural Network
8	Research On Automatic Labeling Of Speech Synthesis Corpora
9	Research On Speech Synthesis Algorithm Based On Sequence To Sequence Model
10	Research On Speech Recognition Based On Convolutional Neural Networks