Font Size: a A A

Research And Impiementation Of Chinese Speech Synthesis Based On Deep Learning

Posted on:2022-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2518306338987459Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The main task of speech synthesis is to transform text information to speech information.It has been used widly in many fileds such as smart home,virtual anchor,voice navigation,information broadcast,education,entertainment and other fields.Speech synthesis is one of the important parts of human-computer interaction.In recent years,Considering the practical application scenarios,Chinese speech synthesis should not only express the correct text information,but also have the diversity of emotional styles,and reduce the delay of Chinese speech synthesis system,which has an important practical value.In this thesis,we study Chinese speech synthesis technology based on deep learning,and realize Chinese speech synthesis system based on deep learning from the aspects of Chinese speech synthesis quality,the delay of Chinese speech synthesis system,Chinese speech synthesis with different emotional style.The system consists of five parts:encoder,attention mechanism,decoder,vocoder and emotion embedding layer.The main contents of this paper are as followsFirst,compared with traditional speech synthesis methods,deep learning technology can greatly reduce the cost of text analysis.In order to improve the efficiency of speech synthesis and the quality of speech synthesis,we propese an improved Chinese speech synthesis model based on Tacotron2—T-LPCNet.The feature dimension is optimized by 80 dimensional Mel spectrum feature and 20 dimensional cepstrum coefficient feature,which improves the speed of speech synthesis.The dimension of features are reduced by 75%,which meets the real-time requirements.The duration of speech synthesis is about 8.9s,and MOS score can reach 3.90.Second,the design of Chinese speech automatic segmentation technology is applied to create Chinese speech emotional databases of six emotions:fear,anger,disgust,happiness,surprise and sadness,which is prepared for the following study of deep learning model.Thirdly,two multi-style Chinese speech synthesis schemes are designed,The style features learned through the network can controlly synthesis speech with different styles.This method extracts speech emotional style features from reference audio to synthesize emotional speech.The MCD score of two combined model can reach 9.81 and 9.95 on the test set.
Keywords/Search Tags:Deep learning, Chinese speech synthesis, LPCNet, self-coding neural network, Multi-style Chinese speech synthesis
PDF Full Text Request
Related items