Research And Impiementation Of Chinese Speech Synthesis Based On Deep Learning

Posted on:2022-10-29

Degree:Master

Type:Thesis

Country:China

Candidate:H Zhang

Full Text:PDF

GTID:2518306338987459

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

The main task of speech synthesis is to transform text information to speech information.It has been used widly in many fileds such as smart home,virtual anchor,voice navigation,information broadcast,education,entertainment and other fields.Speech synthesis is one of the important parts of human-computer interaction.In recent years,Considering the practical application scenarios,Chinese speech synthesis should not only express the correct text information,but also have the diversity of emotional styles,and reduce the delay of Chinese speech synthesis system,which has an important practical value.In this thesis,we study Chinese speech synthesis technology based on deep learning,and realize Chinese speech synthesis system based on deep learning from the aspects of Chinese speech synthesis quality,the delay of Chinese speech synthesis system,Chinese speech synthesis with different emotional style.The system consists of five parts:encoder,attention mechanism,decoder,vocoder and emotion embedding layer.The main contents of this paper are as followsFirst,compared with traditional speech synthesis methods,deep learning technology can greatly reduce the cost of text analysis.In order to improve the efficiency of speech synthesis and the quality of speech synthesis,we propese an improved Chinese speech synthesis model based on Tacotron2�T-LPCNet.The feature dimension is optimized by 80 dimensional Mel spectrum feature and 20 dimensional cepstrum coefficient feature,which improves the speed of speech synthesis.The dimension of features are reduced by 75%,which meets the real-time requirements.The duration of speech synthesis is about 8.9s,and MOS score can reach 3.90.Second,the design of Chinese speech automatic segmentation technology is applied to create Chinese speech emotional databases of six emotions:fear,anger,disgust,happiness,surprise and sadness,which is prepared for the following study of deep learning model.Thirdly,two multi-style Chinese speech synthesis schemes are designed,The style features learned through the network can controlly synthesis speech with different styles.This method extracts speech emotional style features from reference audio to synthesize emotional speech.The MCD score of two combined model can reach 9.81 and 9.95 on the test set.

Keywords/Search Tags:

Deep learning, Chinese speech synthesis, LPCNet, self-coding neural network, Multi-style Chinese speech synthesis

PDF Full Text Request

Related items

1	Research On Technology Of Chinese Speech Synthesis
2	Research And Application Of Speech Synthesis Technology Based On Deep Learning
3	Research On Deep Learning Based End-to-End Chinese Speech Synthesis
4	Research On Chinese Speech Synthesis Method Integrating Pause And Personal Information
5	Research On Deep Learning Based Small-Sized Unit Concatenation Speech Synthesis
6	Study On Chinese Speech Synthesis Methods Based On Deep Learning
7	Research And Application Of Speech Synthesis Method Integrating Emotional Expressiveness
8	Research Of Chinese Speech Synthesis Technology Based On Speech Database
9	Speech Technique Research Of Intelligence Robot
10	Research On Neural Network Based Statistical Parametric Speech Synthesis