Research On Embedded Tibetan Speech Synthesis System

Posted on:2022-10-28

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Wang

Full Text:PDF

GTID:2518306500956439

Subject:Master of Engineering

Abstract/Summary:

The speech synthesis of Tibetan has a significant gap in naturalness and similarity in the synthesized speech compared with the speech synthesis of mainstream languages such as Chinese and English.Therefore,it is of great significance to study the speech synthesis of Tibetan by using the speech synthesis technology of mainstream languages for reference.At the same time,a Tibetan speech synthesis system can be conveniently achieved on the combination of Tibetan speech synthesis technology and embedded devices due to the continuous improvement of the performance of embedded devices.Therefore,the goal of the research is to realize an embedded Tibetan speech synthesis system.The thesis developed a Tibetan text corpus and recorded a corresponding Tibetan speech for Tibetan speech synthesis.The vocoder in the speech synthesis system was also analyzed and improved to set up the Tibetan speech synthesis system based on deep learning,which was transplanted to embedded devices to realize an embedded Tibetan speech synthesis system.The main work and originalities of this thesis are as follows:Firstly,a Tibetan speech synthesis corpus was designed and built.Tibetan sentences of different themes were collected,in which unusual and special sentences were eliminated.The collected sentences were added and deleted according to syllable frequency.A Tibetan speech synthesis corpus of 10,000 high-quality sentences had been established to ensure the essential phonological balance.Secondly,the vocoder for speech synthesis was analyzed and improved.Based on analyzing the principle of the vocoder,the original spectral envelope feature was extracted by the vocoder and changed into a low-dimensional spectral envelope feature.Then a Tibetan speech codec system was implemented.The experimental results showed that the Tibetan speech generated by the improved vocoder had better quality.Thirdly,a Tibetan speech synthesis system based on deep learning was built.In the training stage,contextual annotation information was obtained through a front-end text analysis.At the same time,the improved vocoder extracted acoustic parameter features to complete the training of acoustic model based on the neural network,including Deep Neural Networks(DNN),Hybrid Long Short-Term Memory Networks(Hybrid LSTMs),and Hybrid Bidirectional Long Short-Term Memory networks(Hybrid BLSTMs).In the synthesis stage,contextual annotation information was obtained from the Tibetan text to be synthesized through the text analysis.The acoustic model generated the corresponding acoustic parameter characteristics according to the contextual annotation information.The vocoder then recovered the speech waveform according to the acoustic parameter characteristics generated by the acoustic model.The experimental results showed that the naturalness and similarity of synthesized speech were better than those of the unimproved system with the three models.The hybrid BLSTMs model and the quality of synthesized speech were the best.Finally,an embedded Tibetan speech synthesis system was built.Two embedded Tibetan speech synthesis system frameworks were established.The interaction between the embedded and server sides adopted the client/server(C/S)mode.The experimental results showed that the best implementation method was different with different usage scenarios.

Keywords/Search Tags:

Tibetan speech synthesis, Embedded Tibetan speech synthesis, Vocoder, Speech codec, Deep learning

Related items

1	Research On Statistical Parametric Mandarin-Tibetan Cross-lingual Speech Synthesis
2	Research On Mandarin-Tibetan Cross-lingual Speech Synthesis
3	Research On End-to-End Non-Autoregressive Model-Based Amdo Tibetan Speech Synthesis Technology
4	Research On Statistical Parametric Speech Synthesis Of Tibetan Lhasa Dialect
5	Research On Tibetan Speech Synthesis Technology Based On Mixed Primitives
6	Research On Tibetan Speech Synthesis Based On Deep Learning
7	Research On Speech Synthesis Technology Of Amdo Tibetan Based On Seq2Seq＆WaveNet
8	Research On Sign Language-to-Mandarin/Tibetan Speech Conversion
9	Research On Tibetan Speech Recognition Based On Speech Spectral Features
10	Research On Mandarin-to-Tibetan Cross Lingual Speech Conversion Based On Deep Neural Network