Flexible speech synthesis using weighted finite-state transducers

Posted on:2003-12-04

Degree:Ph.D

Type:Thesis

University:University of Washington

Candidate:Bulyko, Ivan

Full Text:PDF

GTID:2468390011480696

Subject:Engineering

Abstract/Summary:

The main focus of this thesis is on improving the quality of concatenative speech synthesis by taking advantage of the natural (allowable) variability in spoken language, namely, the fact that there are multiple ways of uttering a given sentence and there are several word sequences that can represent a given concept. An architecture for speech generation for constrained domain applications is proposed that tightly integrates language generation and speech synthesis, allowing the choice of words and desired intonation in the system's response to be optimized jointly with the speech output quality. Experiments with a travel planning dialog system have demonstrated that by expanding the space of candidate responses and possible prosodic realizations we achieve higher quality speech output.; The additional flexibility in terms of word sequences, prosodic realizations and pronunciations increases the search space and, consequently, the computational cost of the synthesis system. To address this problem this thesis also offers improvements to the popular unit selection approach for more accurately constraining or pruning the search space at the acoustic level. In particular, we describe a variation to the cluster-based unit database design aimed at constraining the set of candidate units, and we introduce splicing costs into the unit search criterion as a measure to indicate which unit boundaries are particularly good or poor join points, augmenting existing concatenation measures for better pruning of the search space. As a byproduct, the new splicing costs also lead to improvements in speech quality.; Finally, we introduce a modular speech synthesis system architecture where each component is represented with weighted finite-state transducers (WFSTs), and we describe specific WFST implementations of prosody prediction and unit selection modules. Such an architecture provides an efficient representation of flexible targets and allows the steps in the synthesis process to be performed with operations available in a general purpose toolbox.

Keywords/Search Tags:

Synthesis, Quality

Related items

1	Research On View-Synthesis-based 3D Video Coding
2	Software Quality Assessment Techniques And Research
3	Study On The Objective Quality Assessment Metrics Of Virtual View Synthesis
4	The Design And Implementation Of The Higher Occupation Colleges Students' Comprehensive Quality Assessment System
5	Analyzing And Modeling Voice Quality And Jitter In Emotional Speech Synthesis
6	Research Into Software Quality Control And Quantitative Assessment
7	Research On Speech Quality Evaluation For Tibetan Statistical Parametric Speech Synthesis
8	Improving high quality concatenative text-to-speech synthesis using the circular linear prediction model
9	Synthesis Of Fluorescent Indicators For Phosphate And Study On Its Photochemical Sensor And
10	Scalable and High Quality Algorithm Design For High Level Synthesis