| With the continuous development of the network society,people’s daily life and communication are more and more inseparable from the Internet.Among them,Text To Speech,as one of the bridges of humanmachine interaction,plays an important role in the real society.The ultimate goal of Text To Speech research is to convert text sequences into clear,natural and smooth speech signals that people can understand and understand.In recent years,English-Chinese Text To Speech has achieved obvious research results in Text To Speech technology based on deep neural networks,and also shows a synthetic effect close to human natural pronunciation.Since the Tibetan international encoding passed the ISO/IEC standard in 1997,it has also been included in Unicode 2.0.Since then,many relevant researchers have carried out research on word segmentation,syntactic analysis,machine translation and other aspects of Tibetan information processing.The research on Tibetan Text To Speech started late.Therefore,the research of Tibetan Text To Speech is still in the development stage.There are few research methods using deep neural networks in the field of Tibetan Text To Speech research.Therefore,this paper uses the technical advantages of the end-to-end model of deep neural networks to study Tibetan Text To Speech through different Tibetan text preprocessing embedding methods,and finally analyzes and compares the experimental results.The specific main work is as follows:(1)Tibetan has the characteristics of vertical and horizontal bidirectionality,and the spelling rules and structure are special and complex.This paper firstly constructs 20,000 syllable libraries that conform to Tibetan spelling according to Tibetan spelling rules.According to the query,if a certain syllable in the collected data is not in the 20,000 syllable library,the data is likely to have misspellings or other characters.This syllable library plays an important role in the screening and improvement of the corpus.At the same time,it is also found that the scale of Tibetan syllable sets is large and the distribution is uneven.During encoder feature extraction,the learning efficiency of the encoder is reduced due to the large sparsity of the data,which affects the synthesis effect.Therefore,this paper improves the Tibetan Willie Latin transcription scheme and the Unicode encoding transcription scheme,and designs a set of Tibetan text preprocessing schemes suitable for end-to-end Text To Speech.(2)Using the end-to-end Text To Speech model,experiments were carried out on different Tibetan text preprocessing methods,and the synthesis process from Tibetan text input to Tibetan speech output that people can understand was realized.(3)The results of the experiment are analyzed,and the experiment is used to prove that the Tibetan text preprocessing scheme with high usability in Tibetan Text To Speech.The experimental data shows that when the amount of corpus is limited,the preprocessing method of Tibetan component unit is used,and the synthesis effect will be better.(4)Finally,the experiment is summarized and the relative development direction and prospect of future research are put forward. |