Font Size: a A A

Research On Automatic Segmentation Technology And Automatic Segmentation Of Speech In Dai Language Speech Synthesis System

Posted on:2016-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:S X LiFull Text:PDF
GTID:2208330470955399Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Recently, the impact of data abundance extends well beyond speech synthesis. The unit selection synthesis becomes the most popular synthesizer technology. Peoples classified as Dai in China speak the Southwestern Tai languages, including Tai Lii language, Tai Niia language, Tai Dam language, Tai Hongjin language. Xishuangbanna is an autonomous prefecture in the south of Yunnan Province. Xishuangbanna had993,397inhabitants in2000. Dai people make up the plurality at29.89%, with the Han people coming in at a close second at29.11%. Xishuangbanna is the home of the Dai people. According to the value of cultural practices in engendering social capital, we study Tai Lu language. Since the lack of Tai Lu languages, the HMM-based synthesis is better way compared with unit selection. The implementation of our HMM-based speech synthesizer relies on the HTS toolkit. Acoustic modeling is one of two major modules in this system, the other being the vocoder. They are four parts in acoustic modeling that they are text corpus built, speech corpora collected, word segmentation, and phonetic segmentation.This paper is organized as follows.1. On the basic of syllable coverage ratio maximized, we create Tai Lu language corpora which are60MB in size.2. We use the principle of triphone coverage ratio maximized to determine whether the text is collected. As a result, speech corpora are12MB in size.3. The FMM algorithm based on the dictionary is proposed as a utilized method to solve the problem of the word segmentation. The lexicon recall reach89.2%and precision reach92.3%.The F1reach90.7%. For ambiguous boundaries, we used an improvement of FMM algorithm. As a result, the precision is93.8%. The recall rate is88.5%. F1is91.1%.4. In the phase of training acoustic model, the ASR technology is applied to phonetic segmentation, In100sentences, the total of phonemes is111and the frequency is4621. The number of phonemes is7, the mean error of which is less than20ms. The number of phonemes is39, the mean error of which is less than40ms. The number of phonemes is84, the mean error of which is less than60ms. The number of phonemes is108, the mean error of which is less than80ms. The number of phonemes is3, the mean error of which is more than80ms.
Keywords/Search Tags:speech synthesis, Tai L(u|") language, corpora, word segmentation
PDF Full Text Request
Related items