Font Size: a A A

Research On Speech Synthesis Technology Of Amdo Tibetan Based On Seq2Seq?WaveNet

Posted on:2022-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y T DingFull Text:PDF
GTID:2518306752993299Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Speech synthesis is the technology that as speech as the research object,uses methods such as signal processing to synthesize human natural language.Speech synthesis is one of the core technologies of intelligent human-computer voice interaction.It has important theoretical significance and practical value.Tibetan speech synthesis is an important part of Chinese information processing,and it is also the focus and difficulty of Tibetan intelligent human-computer voice interaction.Its research has an important role in promoting the development of intelligence in Tibetan areas.As one of the three major Tibetan dialects,Amdo Tibetan is widely used in Qinghai,Gansu,Sichuan and other regions.Amdo Tibetan Speech synthesis is the hotspot research.Therefore,this paper takes Amdo Tibetan speech synthesis as the main research goal.In order to improve the naturalness,intelligibility and clarity of speech synthesis in Amdo Tibetan,this paper studies deep learning model of Seq2Seq?WaveNet based on four aspects: corpus construction and preprocessing,selection of text input primitives,acoustic model of speech synthesis and speech waveform synthesis.What's more,this paper also designs and implements the Amdo Tibetan speech synthesis system,and applies it to the Amdo bus station announcement system.In terms of corpus construction and preprocessing,this paper constructed a Tibetan text corpus with a size of 8938 KB,including 47023 sentences,476832 syllables and 1290168 phonemes.Speech corpus,and preprocessing work such as digital-text conversion,English and other symbol characters conversion for the text corpus,and pre-emphasis,framing,windowing and other preprocessing work for the Tibetan speech corpus.In terms of the selection of text input primitives,on the basis of analyzing the structural characteristics of Tibetan texts,this paper compares and analyzes the effects of three different primitives,such as phonemes,syllables,and international phonetic symbols,on the performance of Amdo Tibetan speech synthesis.Under the current technical conditions,using phonemes as text primitives is more suitable for the conclusion of Amdo Tibetan speech synthesis.In terms of text acoustic feature extraction,this paper adds a post-net layer on the basis of Seq2 Seq to further optimize the Mel spectral feature extraction effect.In terms of speech waveform synthesis,considering that WaveNet has better recovery ability of phase information,WaveNet is used to replace the current Griffin-lim vocoder commonly used in Tibetan speech synthesis,and the experimental results verify the effectiveness of the Seq2Seq?WaveNet model.In terms of the realization and application of the Amdo Tibetan speech synthesis system,we design and implement a system with the Seq2Seq?WaveNet as the Tibetan speech synthesis model,and applies the synthesis system model to the Amdo Tibetan language bus station announcement system.
Keywords/Search Tags:Amdo Tibetan, Speech Synthesis, Deep learning, Seq2Seq, WaveNet
PDF Full Text Request
Related items