Font Size: a A A

Improving the fine phonetic performance of automatic speech recognizers

Posted on:2002-07-12Degree:Ph.DType:Dissertation
University:Brown UniversityCandidate:Hughes, Tadd BernardFull Text:PDF
GTID:1468390011496432Subject:Engineering
Abstract/Summary:PDF Full Text Request
There are still important unresolved problems in automatic speech recognition due to confusion between phonetically different, yet acoustically similar sounds that are often unable to be corrected by late-stage grammar inference systems. This dissertation addresses the fundamental problem of fine phonetic distinction by investigating the talker-invariant acoustic differences among phonemes and by introducing signal features that make highly confusable phonemes more distinct. The compact alphadigits vocabulary used for this investigation contains a concentrated assortment of sets that are exemplary for phonetic discrimination issues, albeit in reasonably fixed context.; Two approaches were investigated to improve modeling of the alphadigits. The first approach introduces methods to a standard HMM-based recognition system which better estimate the probability distributions for the fixed sets of features used by the system.; The second approach employs two stages. The first stage uses the HMM-based system of the first approach to obtain a tentative segmentation and classification of the speech. The second stage, which is composed of phonetically-specific targeted algorithms, then combines varying degrees of first-stage information with discriminant transformations to improve the resultant recognition performance. Two sets of confusable words are targeted in this work: (1) The nasal-set {lcub}M,N{rcub}, a slowly-varying transition-based discrimination problem. (2) The E-set stop consonants {lcub}B,P, D,T{rcub}, a rapid-transition-based discrimination problem.; Experiments were performed comparing the LEMS laboratory's standard HMM recognizer to the two-stage system on four separate test sets (two closed and two open). Results of the continuous speech nasal-set recognition experiments show that the best two-stage nasal-word models yield an average 92% recognition accuracy with an average error reduction of nearly 50% over the standard recognizer. The stop-consonant targeted models yield an average 91% correct recognition with an average 36% error reduction over the standard recognizer.
Keywords/Search Tags:Speech, Recognition, Phonetic, Recognizer, Standard, Average
PDF Full Text Request
Related items