Improving the fine phonetic performance of automatic speech recognizers | Posted on:2002-07-12 | Degree:Ph.D | Type:Dissertation | University:Brown University | Candidate:Hughes, Tadd Bernard | Full Text:PDF | GTID:1468390011496432 | Subject:Engineering | Abstract/Summary: | PDF Full Text Request | There are still important unresolved problems in automatic speech recognition due to confusion between phonetically different, yet acoustically similar sounds that are often unable to be corrected by late-stage grammar inference systems. This dissertation addresses the fundamental problem of fine phonetic distinction by investigating the talker-invariant acoustic differences among phonemes and by introducing signal features that make highly confusable phonemes more distinct. The compact alphadigits vocabulary used for this investigation contains a concentrated assortment of sets that are exemplary for phonetic discrimination issues, albeit in reasonably fixed context.; Two approaches were investigated to improve modeling of the alphadigits. The first approach introduces methods to a standard HMM-based recognition system which better estimate the probability distributions for the fixed sets of features used by the system.; The second approach employs two stages. The first stage uses the HMM-based system of the first approach to obtain a tentative segmentation and classification of the speech. The second stage, which is composed of phonetically-specific targeted algorithms, then combines varying degrees of first-stage information with discriminant transformations to improve the resultant recognition performance. Two sets of confusable words are targeted in this work: (1) The nasal-set {lcub}M,N{rcub}, a slowly-varying transition-based discrimination problem. (2) The E-set stop consonants {lcub}B,P, D,T{rcub}, a rapid-transition-based discrimination problem.; Experiments were performed comparing the LEMS laboratory's standard HMM recognizer to the two-stage system on four separate test sets (two closed and two open). Results of the continuous speech nasal-set recognition experiments show that the best two-stage nasal-word models yield an average 92% recognition accuracy with an average error reduction of nearly 50% over the standard recognizer. The stop-consonant targeted models yield an average 91% correct recognition with an average 36% error reduction over the standard recognizer. | Keywords/Search Tags: | Speech, Recognition, Phonetic, Recognizer, Standard, Average | PDF Full Text Request | Related items |
| |
|