Font Size: a A A

Research On Embedded Tibetan Speech Synthesis System

Posted on:2022-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2518306500956439Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The speech synthesis of Tibetan has a significant gap in naturalness and similarity in the synthesized speech compared with the speech synthesis of mainstream languages such as Chinese and English.Therefore,it is of great significance to study the speech synthesis of Tibetan by using the speech synthesis technology of mainstream languages for reference.At the same time,a Tibetan speech synthesis system can be conveniently achieved on the combination of Tibetan speech synthesis technology and embedded devices due to the continuous improvement of the performance of embedded devices.Therefore,the goal of the research is to realize an embedded Tibetan speech synthesis system.The thesis developed a Tibetan text corpus and recorded a corresponding Tibetan speech for Tibetan speech synthesis.The vocoder in the speech synthesis system was also analyzed and improved to set up the Tibetan speech synthesis system based on deep learning,which was transplanted to embedded devices to realize an embedded Tibetan speech synthesis system.The main work and originalities of this thesis are as follows:Firstly,a Tibetan speech synthesis corpus was designed and built.Tibetan sentences of different themes were collected,in which unusual and special sentences were eliminated.The collected sentences were added and deleted according to syllable frequency.A Tibetan speech synthesis corpus of 10,000 high-quality sentences had been established to ensure the essential phonological balance.Secondly,the vocoder for speech synthesis was analyzed and improved.Based on analyzing the principle of the vocoder,the original spectral envelope feature was extracted by the vocoder and changed into a low-dimensional spectral envelope feature.Then a Tibetan speech codec system was implemented.The experimental results showed that the Tibetan speech generated by the improved vocoder had better quality.Thirdly,a Tibetan speech synthesis system based on deep learning was built.In the training stage,contextual annotation information was obtained through a front-end text analysis.At the same time,the improved vocoder extracted acoustic parameter features to complete the training of acoustic model based on the neural network,including Deep Neural Networks(DNN),Hybrid Long Short-Term Memory Networks(Hybrid LSTMs),and Hybrid Bidirectional Long Short-Term Memory networks(Hybrid BLSTMs).In the synthesis stage,contextual annotation information was obtained from the Tibetan text to be synthesized through the text analysis.The acoustic model generated the corresponding acoustic parameter characteristics according to the contextual annotation information.The vocoder then recovered the speech waveform according to the acoustic parameter characteristics generated by the acoustic model.The experimental results showed that the naturalness and similarity of synthesized speech were better than those of the unimproved system with the three models.The hybrid BLSTMs model and the quality of synthesized speech were the best.Finally,an embedded Tibetan speech synthesis system was built.Two embedded Tibetan speech synthesis system frameworks were established.The interaction between the embedded and server sides adopted the client/server(C/S)mode.The experimental results showed that the best implementation method was different with different usage scenarios.
Keywords/Search Tags:Tibetan speech synthesis, Embedded Tibetan speech synthesis, Vocoder, Speech codec, Deep learning
PDF Full Text Request
Related items