Research On The Speech Synthesis Technology Of Tibetan Dialect

Posted on:2022-10-05

Degree:Master

Type:Thesis

Country:China

Candidate:X T Xie

Full Text:PDF

GTID:2505306509997779

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Speech synthesis technology is a cutting-edge technology in the field of information processing.In recent years,my country’s Tibetan information technology has achieved leapfrog development,which has played a positive role in the economic and social development of Tibet.However,there is still much room for development in the research of Tibetan speech synthesis technology.This is mainly due to the lack of in-depth knowledge of Tibetan speech.Research,and limited by the lack of resources,a speech synthesis system that achieves practical effects is rare in this research field,and the existing research results are in the experimental stage.Therefore,in-depth research and early resolution of the key technologies of Tibetan speech synthesis and forming an overall solution are currently vital research content in the field of information processing.This will help promote the development and prosperity of Tibetan culture and expand the development of Tibetan culture.International influence,strengthen the self-development capacity of information technology in my country’s Tibetan areas,and accelerate the pace of integration of Tibetan language and modernization.This article is based on the research of Tibetan speech synthesis based on the U-Tibet dialect.The article first adopts the traditional method of "parameter synthesis" based on the HMM model,and mainly analyzes the relevant technologies involved in the current text analysis of the front-end Tibetan speech synthesis.It mainly includes Tibetan phoneme analysis,Latin transliteration,segment labeling and prosody labeling,Tibetan pronunciation rules,Tibetan polysyllabic word analysis,special symbol processing,Tibetan automatic word segmentation,part-of-speech tagging and other front-end language models involved in the key Question,the final front-end text analysis results generate a set of prosodic texts to provide necessary information for the back-end acoustic model.Considering that the speech synthesized by the traditional method based on "parameter synthesis" of the HMM model has the disadvantages of unnaturalness,poor timbre,and insufficient intelligibility,this article finally introduces the most popular "end-to-end synthesis" method in the industry.That is,the "end-to-end synthesis"-Tacotron model based on deep learning.This paper studies the Tibetan speech conversion model based on the Encoder-Decoder structure of the attention mechanism,and realizes the Tibetan speech conversion technology with input as characters and output as spectrogram by drawing on the model architecture of mainstream language speech conversion.Finally,through experiments,the synthesis effect of the "end-to-end synthesis" deep learning model was evaluated objectively and subjectively.By comparing with the MOS score results based on HMM statistical parameter synthesis,it can be clearly seen that the effect of the Tacotron model synthesis is better than the effect of parameter synthesis.The evaluation criteria are analyzed,whether it is from the timbre,naturalness and intelligibility of the speech,The MOS scoring result based on "end-to-end synthesis" is greater than that of "parameter synthesis"("end-to-end synthesis" score 4.73 points(or 4.61 points)> "parameter synthesis" score 3.96 points).In addition,through a detailed analysis of the mainstream "end-to-end synthesis" Tacotron model,it can be seen that when the model is trained 25,000 times,the alignment effect of the attention mechanism and the synthesizing speech spectrogram have achieved good results.Regardless of the overall analysis of the synthesized speech,or the analysis of the synthesized speech from the three aspects of timbre,naturalness and intelligibility,the result obtained is that the score of the U-Tibet dialect synthesized using the Tacotron model is higher than that based on The score of the synthetic speech with statistical parameters.Therefore,the "end-to-end" synthesis method has research and application value in Tibetan speech synthesis.

Keywords/Search Tags:

Wei-Zang dialect, speech synthesis, Hidden Markov Model, Tacotron Model

PDF Full Text Request

Related items

1	The Design And Realization Of Mogolian Speech Synthesis System
2	Research On Automatic Chord Arrangement Based On Hidden Markov Model
3	Discovering The Dynamic Changes Of Brain Activity Across The Adult Lifespan Using Hidden Markov Model
4	Hunan Dialects Identification Based On GRU-HMM Acoustic Model
5	Design Of Speech Recognition System For Northern Shaanxi Dialect Based On Deep Learning
6	The Research On Tibetan Speech Recognition Technology
7	Research On Automatic Annotation Algorithm Of Piano Fingering Based On Prior Knowledge And Improved Hidden Markov Model
8	Forecasting The Stock Market With The Hidden Markov Model Based On Investment Sentiment
9	Research On Mandarin-Xingtai Dialect Cross-lingual Speech Synthesis
10	Research On Automatic Notation Of Word For Tibetan Corpus Based On HMM