Research On Automatic Segmentation Technology And Automatic Segmentation Of Speech In Dai Language Speech Synthesis System

Posted on:2016-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:S X Li

Full Text:PDF

GTID:2208330470955399

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

Recently, the impact of data abundance extends well beyond speech synthesis. The unit selection synthesis becomes the most popular synthesizer technology. Peoples classified as Dai in China speak the Southwestern Tai languages, including Tai Lii language, Tai Niia language, Tai Dam language, Tai Hongjin language. Xishuangbanna is an autonomous prefecture in the south of Yunnan Province. Xishuangbanna had993,397inhabitants in2000. Dai people make up the plurality at29.89%, with the Han people coming in at a close second at29.11%. Xishuangbanna is the home of the Dai people. According to the value of cultural practices in engendering social capital, we study Tai Lu language. Since the lack of Tai Lu languages, the HMM-based synthesis is better way compared with unit selection. The implementation of our HMM-based speech synthesizer relies on the HTS toolkit. Acoustic modeling is one of two major modules in this system, the other being the vocoder. They are four parts in acoustic modeling that they are text corpus built, speech corpora collected, word segmentation, and phonetic segmentation.This paper is organized as follows.1. On the basic of syllable coverage ratio maximized, we create Tai Lu language corpora which are60MB in size.2. We use the principle of triphone coverage ratio maximized to determine whether the text is collected. As a result, speech corpora are12MB in size.3. The FMM algorithm based on the dictionary is proposed as a utilized method to solve the problem of the word segmentation. The lexicon recall reach89.2%and precision reach92.3%.The F1reach90.7%. For ambiguous boundaries, we used an improvement of FMM algorithm. As a result, the precision is93.8%. The recall rate is88.5%. F1is91.1%.4. In the phase of training acoustic model, the ASR technology is applied to phonetic segmentation, In100sentences, the total of phonemes is111and the frequency is4621. The number of phonemes is7, the mean error of which is less than20ms. The number of phonemes is39, the mean error of which is less than40ms. The number of phonemes is84, the mean error of which is less than60ms. The number of phonemes is108, the mean error of which is less than80ms. The number of phonemes is3, the mean error of which is more than80ms.

Keywords/Search Tags:

speech synthesis, Tai L(u|") language, corpora, word segmentation

PDF Full Text Request

Related items

1	Dai Language Segmentation Based On Dictionary And Statistics
2	Research On Automatic Labeling Of Speech Synthesis Corpora
3	Research On HMM-based Dai Speech Synthesis System
4	Research On Dai 's Word Segmentation Based On Machine Learning Model
5	Word segmentation, word recognition, and word learning: A computational model of first language acquisition
6	Cracking the language code: Neural basis of word segmentation throughout development
7	Bilingual Word Representation Learning From Non-parallel Corpora
8	Text Analysis Of Burmese Language For Speech Synthesis
9	Research On Parallel Corpora-based Unsupervised Part-of-speech Tagging For Chinese
10	Embedded Speech Synthesis Based On Initial And Final Units