Pronunciation modeling in speech synthesis

Posted on:1999-05-09

Degree:Ph.D

Type:Thesis

University:University of Pennsylvania

Candidate:Miller, Corey Andrew

Full Text:PDF

GTID:2468390014467707

Subject:Language

Abstract/Summary:

This dissertation proposes to investigate the area of pronunciation modeling in speech synthesis. By pronunciation modeling, we mean architectures and principles for generating high-quality human-like pronunciations. The term pronunciation modeling has previously been applied in the context of speech recognition (e.g. Byrne et al. 1997). In that context, it describes theories and procedures for handling the pronunciation variation that naturally occurs across speakers. In contrast, our work is in the domain of text-to-speech synthesis, which, as we will show, requires modeling the pronunciation variation of an individual whose speech the synthesizer is attempting to model. We will explain our methodology for learning and reproducing pronunciation variation on an individual basis, and show how most crucial features of such variation can be easily generated using the architecture we describe. Throughout the course of this exposition, we highlight contributions to linguistic theory that such a thorough analysis of individual variation provides. We describe the postlexical module of an English text-to-speech synthesizer. This module is responsible for transforming underlying lexical pronunciations from a lexical database into contextually appropriate surface postlexical pronunciations. This transformation is achieved by machine learning of a corpus of hand-labeled postlexical pronunciations that have been aligned with lexical pronunciations. The machine learning is conducted by a neural network, whose architecture and data encoding we describe. A thorough analysis of the performance of the postlexical module is offered, with attention to the relative success of the neural network at learning a wide range of postlexical phenomena. We examine the extent to which a symbolic approach to allophony is warranted, and provide an acoustic analysis that attempts to provide an answer to this question. Assessments of the success of currently existing theories of phonetics, phonology and their interface are offered, based on the experience of generating a complete postlexical phonology of English for use in synthetic speech.

Keywords/Search Tags:

Speech, Pronunciation modeling, Postlexical

Related items

1	Pronunciation modeling for conversational speech recognition
2	Research And Application Of Pronunciation Detection For Deaf Children Rehabilitation
3	Research On Pronunciation Space Modeling Of Non-native Speakers
4	The National Language And Accent Pronunciation Dictionary Adaptive Mandarin Speech Recognition
5	Speech Recognition's Application In Computer-assisted Language Learning
6	Speaker dynamics as a source of pronunciation variability for continuous speech recognition models
7	Pronunciation modeling for spontaneous Mandarin speech recognition
8	Pronunciation Evaluation Using Short And Long-term Features
9	The Uygur Natural Spoken In Multiple Pronunciation Word Modeling And Performance Analysis
10	Visual Speech Synthesis Technology And Its Application Studies In English Pronunciation Tutoring