Font Size: a A A

Study And Implement Of Uighur TTS System

Posted on:2005-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2168360125459217Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The man-made voice system is called speech synthesis, which is used to address the issue of making machines 'speak' the same way as human. However, making the machine 'speak' in a way of putting out discontinuous syllables is not acceptable. The machine should be made to send out voice, which complies with the linguistic rules, in a consecutive and natural way. According to human's language habit, people first has the intention before speaking, which then forms a concept in his mind and turn to be speech finally. Up to date, people has a little knowledge on the advanced nervous activities and the speech synthesis is based on the text-to-speech technology, which involves many areas, such as linguistics, phonetics, sound signal processing and psychology. With the information technology, such as computer and network system, is used more and more widely in people's daily life, the speech synthesis will be broadly used in the future. A number of foreign and domestic institutes and companies have made investment in the research and development of the speech synthesis technology for various languages and have made great achievements. Due to the ethnic structure in the Uighur Autonomous Region of Xijiang province and the importance of its physical location, it has significant social benefits and broad application prospect to study the text-to-speech system. Therefore, this article has a major discussion on how to create a well-fined, natural and widely-applicable speech synthesis system based on the features of the Uighur language and phonetics. The author introduces in the following aspects:Based on the analysis of the Uighur phonetic structure as well as the language and phonetics, the author chooses the syllable and etyma-affix of Uighur as the basic unit for two TTS system.. During the study of syllable-based TTS the author introduces in the following aspects: Based on the analysis of the Uighur phonetic structure as well as the phonetics, the author chooses the syllable of Uighur as the basic unit for speech synthesis. And, due to the transformation of the written form of the Uighur letters and the implicit representation of the Uighur syllables, the writer also solves problems in the Uighur speech synthesis, such as syllable separation and the text analysis capability. By having a general analysis and statistics of the phonetic rhythm parameters in the natural speech flow, the author summarize the rhythm transformation rules of the Uighur language, such as time-based transformation rules of syllables in words and sentences, the stress transformation rules and pause rules in words, tone changes in the four types of sentence structures in the Uighur language, and the influence of these sentence structures on the rhythm of the speech flow. The author put forward a concept called level-based phonetic repository, and adopt a four-level syllable phonetic repository, which uses such hierarchical structure Uighur syllables. When the size of the phonetic repository has a relatively low increase, this syllable phonetic repository, integrating with the TD-PSOLA algorithm, allows the adjustment for the rhythm of the synthesized speech, which greatly improves the naturalness of the synthesized speech of the Uighur language. A speech synthesis system, which can put out clear and natural female-voice-based Uighur speech, has been developed based on the above achievements. Since the language resource repository uses syllables as the basic synthesis units, and the Uighur language has a limited number of syllables, this system is well qualified to synthesize an unlimited number of texts for the Uighur language in a real-time mode.Based on the analysis of the Uighur phonetic structure as well as composing of uighur words, the author puts forward a new way to set up a etyma-affix-based TTS system .In this study, the author successfully divide a word into etyma and affixes and establishes a etyma-affix-based TTS.
Keywords/Search Tags:Text to speech, Concatenative synthesis, prosodic features, PSOLA, Etyma, Affix
PDF Full Text Request
Related items