Font Size: a A A

Pronunciation modeling for conversational speech recognition

Posted on:2002-12-31Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Saraclar, MuratFull Text:PDF
GTID:1468390011997560Subject:Engineering
Abstract/Summary:
Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. For this reason, pronunciation modeling has received considerable attention in recent automatic speech recognition literature. Most of the attention however has focussed on describing an alternate pronunciation as a different sequence of phonetic units using the same inventory of phones which describe canonical pronunciations. Use of such pronunciation models during recognition is known to yield moderate improvements in recognition accuracy. In this dissertation we present new pronunciation modeling techniques developed to accommodate the high degree of pronunciation variability encountered in conversational speech with significant gain in recognition performance.; Analysis of manual phonetic transcription of conversational speech reveals a large number (>20%) of instances where human labelers disagree on the identity of the surface form. We present acoustic evidence that offers an explanation: when a pronunciation deviates from its canonical form, it is often the case that neither the canonical nor the alternate phone represent the acoustics very well. The actual pronunciations lie on a continuum between these two extremes. Based on this analysis, two methods for accommodating pronunciation variation are developed. The first method attempts to solve the problem by separately modeling each baseform/surface-form pair. The second method accommodates the nonstandard pronunciations in a novel manner. Rather than allowing a phoneme in the canonical pronunciation to be realized as one of a few distinct alternate phones, the Hidden Markov Model (HMM) states of the phoneme's model are instead allowed to share Gaussian mixture components with the HMM states of the model(s) of the alternate realization(s).; This dissertation provides a fundamental and quantitative insight into pronunciation variability in spontaneous speech and demonstrates techniques for accommodating this variability within the framework of traditional automatic speech recognition systems that assume temporally non-overlapping phonetic segments.
Keywords/Search Tags:Speech, Pronunciation, Variability
Related items