Improving the fine phonetic performance of automatic speech recognizers

Posted on:2002-07-12

Degree:Ph.D

Type:Dissertation

University:Brown University

Candidate:Hughes, Tadd Bernard

Full Text:PDF

GTID:1468390011496432

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

There are still important unresolved problems in automatic speech recognition due to confusion between phonetically different, yet acoustically similar sounds that are often unable to be corrected by late-stage grammar inference systems. This dissertation addresses the fundamental problem of fine phonetic distinction by investigating the talker-invariant acoustic differences among phonemes and by introducing signal features that make highly confusable phonemes more distinct. The compact alphadigits vocabulary used for this investigation contains a concentrated assortment of sets that are exemplary for phonetic discrimination issues, albeit in reasonably fixed context.; Two approaches were investigated to improve modeling of the alphadigits. The first approach introduces methods to a standard HMM-based recognition system which better estimate the probability distributions for the fixed sets of features used by the system.; The second approach employs two stages. The first stage uses the HMM-based system of the first approach to obtain a tentative segmentation and classification of the speech. The second stage, which is composed of phonetically-specific targeted algorithms, then combines varying degrees of first-stage information with discriminant transformations to improve the resultant recognition performance. Two sets of confusable words are targeted in this work: (1) The nasal-set {lcub}M,N{rcub}, a slowly-varying transition-based discrimination problem. (2) The E-set stop consonants {lcub}B,P, D,T{rcub}, a rapid-transition-based discrimination problem.; Experiments were performed comparing the LEMS laboratory's standard HMM recognizer to the two-stage system on four separate test sets (two closed and two open). Results of the continuous speech nasal-set recognition experiments show that the best two-stage nasal-word models yield an average 92% recognition accuracy with an average error reduction of nearly 50% over the standard recognizer. The stop-consonant targeted models yield an average 91% correct recognition with an average 36% error reduction over the standard recognizer.

Keywords/Search Tags:

Speech, Recognition, Phonetic, Recognizer, Standard, Average

PDF Full Text Request

Related items

1	GMM Based Connected Digits Speech Recognizer And The State Of The Art Of Language Modeling For Large Vocabulary Speech Recognizer
2	Speech recognition based on phonetic features and acoustic landmarks
3	Study And Design Of Speech Recognition System Based On Phonetic Element
4	Research And Implementation Of Speaker-independent And Isolated Words Phonetic Recognition Based On DDP Platform
5	Design And Implementation Of Chinese Text Error Correction System After Speech Recognition
6	Using computer-based speech recognition technology as a practice partner for persons with severe motor speech disorders
7	Research On Isolated Word Speech Recognition Algorithm And Realization On DSP
8	Research Of Speech Recognition Based On Convolution Neural Network
9	Research On International Phonetic Alphabet Recognition Algorithm
10	Research And Realization Of Speech Recognition Technology Based On DSP