Research On Crucial Techniques In Chinese Text To Speech System

Posted on:2009-06-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:P M Huang

Full Text:PDF

GTID:1118360278965426

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Text-to-Speech (TTS) is a useful technology that converts arbitrary text into a speech signal. It can be applied to various fields, e.g. car navigation, announcements in railway stations, response services in telecommunications, and e-mail reading.Although large corpus based systems have been able to generate high speech quality, but there are still some shortcomings. In particular, it can not be applied to devices with limited resources, due to the huge storage demand. At present, there are generally two types of solutions, one is to use new methods such as HMM based speech synthesis system, and the other is to reduce the redundancy of corpus greatly under the premise of maintaining high speech quality (small corpus TTS system). Both of the two methods can be used to reduce the storage demand significantly. The latter method can obtain better output speech but the storage demand is a little bigger, comparing with the former method.In this paper, some critical issues are further researched for the small corpus TTS system. The research and innovations are described in details as follows:1. Design of synthesis unit inventory and construction of prosodic model are two key issues for small corpus TTS system. But they are dependent on a large corpus with labeling information. Among the labeling task, precise speech segmentation and labeling are very important. To solve the problem, an automatic segmentation and labeling method that combines statistics approaches with rules is proposed. Two types of HMM models are utilized to produce the INITIAL/FINAL and syllable boundaries. Three feature detection algorithms are applied to boundary refinement for speech boundaries of voiced/unvoiced/silence. Experimental results show that the proposed method can improve the performance of the segmentation system significantly.2. The clustering problem of syllable pitch contours is studied. By doing clustering and reasonable sample selection, the size of the large speech corpus can be significantly reduced. Besides, by introducing the speech coding technique, a small-size multi-sample tonal mono-syllable corpus can be built to satisfy the demands of clarity and naturalness for small corpus TTS system or embedded TTS systems. For pitch contours with different lengths, a non-fixed-length contours clustering approach is proposed. This approach introduces the idea of dynamic programming (DP) into clustering. Firstly, the pitch of contours is normalized (zero-mean). Then, the best path is found between two contours using the DP method. Finally, the distance measure of two contours along this path is calculated. If the shapes of the two pitch contours are similar, the distance measure value will be very low. In the stage of sample selection, the tone domain of syllables is divided by pitch means and then the typical samples are identified according to their levels and clusters. Clustering experiments show that better clustering results can be achieved by this approach compared with the traditional approaches. And new clustering approach is also validated by synthesis experiments. 3. A prosodic model is proposed. It can be used to predict the pitch contours of sentence. The method of doing that is as follows: (1) The pitch contour templates are obtained by clustering; (2) The decision tree method is used to construct a prediction model from contextual information of syllable to pitch contour templates; (3) According to difference contexts, the control parameters of syllable pitch contour templates as pitch mean, the syllable duration and the INITIAL duration will be computed respectively and the acoustic parameters index trees will be constructed for each kind of tonal syllable. (4) The pitch contours of sentence will be obtained via syllabic contexts, pitch contour templates and its prediction model, the acoustic parameters index trees, and silence durations.

Keywords/Search Tags:

Text-to-Speech System, Speech Automatic Segmentation and Labeling, Speech Corpus Reduction, Prosodic Modeling

PDF Full Text Request

Related items

1	Research On Automatic Construction Of Speech Corpus And Speech Minimized Labeling
2	The Method And Implementation Of ToBI Automatic Prosodic Labeling In English Text To Speech System
3	Research On Problems Of Text-To-Speech System
4	The Research Of Prosodic Control Algorithm And Realization For Chinese Speech Synthesis
5	An Automatic Labeling System For Broadcast News
6	Vietnam Text To Speech System Front-end Text Analysis
7	The Study And Application Of Text-to-Speech System
8	Research On Automatic Labeling Of Speech Synthesis Corpora
9	Corpus Supported English Text To Speech Synthesis Engine
10	Research On The Technology Of Automatic Segmentation For Text-To-Speech System