Font Size: a A A

Research On Speech Synthesis Of Shanghai Dialect Based On Deep Learning

Posted on:2020-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2428330572485952Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of smart devices such as smart phones and smart speakers,voice interaction technology has received more and more attention.As one of the core technologies of speech interaction,speech synthesis has become an indispensable research hotspot.Speech synthesis technology is constantly improving,and the quality of machine-synthesized speech is gradually approaching natural speech.At the same time,users begin to pay more attention to the personalized features of synthetic speech,in which dialect speech synthesis technology increases the personality of synthesized speech.It is very popular among users.There are many Chinese dialects,and the speech synthesis technology of dialects has a very important meaning.Considering that there is no research on the speech synthesis technology of Shanghai dialect,this thesis uses Shanghai dialect as the research object,and establishes a corpus for speech synthesis in Shanghai dialect,and proposes a text analysis method for Shanghai dialect.On this basis,the deep fully-connected network(DNN)and Long short-term memory(LSTM)network are introduced into the speech synthesis of Shanghai dialect,and the CBHG module in the literature [1] is also combined.Based on the network,an acoustic modeling method based on LSTM+CBHG is proposed.The main work and innovations of this thesis are as follows:Firstly,established a corpus for speech synthesis in Shanghai dialect.According to the pronunciation characteristics of Shanghai dialect,the text corpus of 1800 Shanghai dialects was designed to cover the initials,finals and tones of Shanghai dialect and the pronunciation of commonly used literary words.The Shanghai dialect speech corpus of an adult male speaker of the text corpus was recorded in a professional studio.Secondly,a method of text analysis of Shanghai dialect is proposed.The input of Mandarin text is firstly subjected to text regularization,word segmentation and prosody prediction,and then obtain context information such as sentence boundary,part of speech and prosodic boundary of the text;then,under the guidance of the Shanghai dialect vocabulary dictionary and the special sound dictionary,the graphemes are converted into phonemes,according to The syllable mapping law obtains the complete Pinyin of Shanghai dialect;combined with the contextinformation and Pinyin of Shanghai dialect,the context-dependent labels of Shanghai dialect is finally generated.Thirdly,the speech synthesis method of Shanghai dialect based on DNN and LSTM recurrent neural network is implemented respectively.At the same time,the speech synthesis method based on LSTM+CBHG is proposed.Through the Shanghai dialect text analysis to obtain the context-dependent labels of the text,the WORLD vocoder is used to extract the acoustic parameters of the speech,and then the three acoustic models are trained with the normalized language features and acoustic features,and the acoustic features are predicted to be denormalized and then smoothed.The spectrum is enhanced,and finally fed to the WORLD vocoder to restructure the Shanghai dialect speech waveform.The comparison of the three experiments gives the results of this experiment.
Keywords/Search Tags:dialect speech synthesis, Shanghai dialect, text analysis, DNN, LSTM
PDF Full Text Request
Related items