Speaker dynamics as a source of pronunciation variability for continuous speech recognition models

Posted on:2005-10-22

Degree:Ph.D

Type:Thesis

University:University of Washington

Candidate:Bates, Rebecca Anne

Full Text:PDF

GTID:2458390008492687

Subject:Engineering

Abstract/Summary:

A significant source of variation in spontaneous speech is due to intra-speaker pronunciation changes. Previous work has identified several factors related to pronunciation variability, such as phonetic context and speaking rate, which are useful to model in automatic speech recognition. This work examines new higher-level information sources: syntax, discourse structure and prosody, specifically the relationship between these factors and pronunciation variation as seen in reduction and hyper-articulation. The key contributions of this work include (1) analysis of high-level factors, providing new cues for improving prediction of pronunciation variation, (2) a framework for including dynamic pronunciation models in automatic speech recognition systems, and (3) an analysis of feature-based pronunciation models with suggestions for their incorporation into ASR systems.; Key findings from the analysis of high-level factors are attributes that are most useful for predicting variability, including: part-of-speech (POS) of the target word and neighboring words, location of the word in an utterance, the number of FO slope changes within the word, word duration, and average word energy. Pronunciation prediction experiments show a reduction in phone error rate of 2.3% relative and similar reductions in perplexity over a baseline model using only phonetic context.; Incorporating higher-level information (such as hypothesis-dependent word context or word-level FO values) into ASR systems requires a rescoring approach. A framework for this is presented, with recognition results using various types of pronunciation models on the Switchboard task. We obtain a small but statistically significant improvement in recognition performance with a baseline static model using phonetic context but no significant gains from extending this model to incorporate POS-dependent pronunciations.; We also present a phonetic-feature-based prediction model where phones are represented by a vector of 21 symbolic features that can be on, off, unspecified or unused. Feature changes are predicted rather than phone changes, allowing for varying productions of phones, e.g., nasalized vowels. We studied feature interaction by examining different groupings of dependent features and showed that a hierarchical grouping with conditional dependencies leads to lower perplexity. We find that feature-based models are more efficient than phone-based models in the sense of requiring fewer parameters to predict variation while giving a smaller distance to the hand-labeled form and similar perplexity values.

Keywords/Search Tags:

Pronunciation, Speech, Variation, Model, Variability, Changes, Factors

Related items

1	Pronunciation modeling for conversational speech recognition
2	The National Language And Accent Pronunciation Dictionary Adaptive Mandarin Speech Recognition
3	Research And Application Of Pronunciation Detection For Deaf Children Rehabilitation
4	HMM-based Pronunciation People Switching System
5	Speech Recognition's Application In Computer-assisted Language Learning
6	Visual Speech Synthesis Technology And Its Application Studies In English Pronunciation Tutoring
7	Research On Variation Speech Recognition Technology Based On Cortex-A8
8	Speech Driven Three-dimensional Pronunciation Action Synthesis System Implementation
9	Variability of sound productions in apraxia of speech: Perceptual analysis
10	A Research On Key Technology Of Computer Assisted Putonghua Pronunciation Assessment