Font Size: a A A

Study On Speech Synthesis Based On Deep Neural Network

Posted on:2019-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:J T ZhangFull Text:PDF
GTID:2428330566984956Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the DNN-based speech synthesis algorithm,the DNN model establishes a mapping from linguistic feature to acoustic feature.Due to the limitation of the DNN model,it is difficult to reflect the overall characteristics of the entire utterance.In order to generate a smooth speech parameter trajectory,the acoustic features used include not only static feature but also dynamic feature.Then a speech parameter generation algorithm can be used to generate a smooth speech parameter trajectory.Because this method requires statistics at all frames in an utterance,there is a high latency when synthesizing speech.In order to solve the above problems,the speech synthesis algorithms based on deep neural network are studied in the thesis,and the main contributions are as follows:(1)A speech synthesis algorithm based on DNN considering GV is proposed.In the training stage,for the state duration model,the input is linguistic feature and the output is state duration feature;for the GV model,the input is utterance-level linguistic feature and the output is GV feature;for the acoustic model,the input is the GV-combined linguistic feature and the output is acoustic feature,and the regression models are all DNN ones.In the synthesizing stage,the constructed linguistic feature is first input into the state duration model and the GV model to generate the state duration feature and the GV feature.Then the encoded linguistic feature is up-sampled according to the state duration and the GV-combined linguistic feature is generated according to the GV.The GV-combined linguistic feature is then input into the acoustic model,generating acoustic feature.Finally,synthesized speech is obtained by sending the acoustic feature into the vocoder.(2)A low-latency speech synthesis method based on LSTM is improved.In order to implement a low-latency speech synthesis algorithm,the acoustic feature used only includes static feature.To generate a smooth speech parameter trajectory,the LSTM-based recurrent output layer is used to obtain a trainable speech parameter smoother for output feature.In the synthesizing stage,duration prediction,acoustic feature prediction,and vocoding are executed in a streaming manner,enabling low-latency speech synthesis.
Keywords/Search Tags:Speech Synthesis, Deep Neural Network, Acoustic Model, Global Variance, Low Latency
PDF Full Text Request
Related items