Font Size: a A A

Large-vocabulary speaker-independent continuous speech recognition: The SPHINX system

Posted on:1989-02-27Degree:Ph.DType:Dissertation
University:Carnegie Mellon UniversityCandidate:Lee, Kai-FuFull Text:PDF
GTID:1478390017456137Subject:Mathematics
Abstract/Summary:
Speaker independence, continuous speech, and large vocabulary are three of the greatest problems in automatic speech recognition. Previous accurate speech recognizers avoided dealing with all three problems simultaneously. This dissertation describes SPHINX, the first system to demonstrate the feasibility of accurate large-vocabulary speaker-independent continuous speech recognition.;SPHINX is based on four important principles: (1) use of a sophisticated yet tractable model of speech, (2) incorporation of human speech knowledge, (3) utilization of speech units that are trainable, well-understood, and context-insensitive, and (4) ability to learn and to adapt to individual speakers.;Hidden Markov models (HMMs) are used to represent speech in SPHINX. Hidden Markov modeling is a powerful technique capable of robust and succinct modeling of speech. With their efficient maximum likelihood training and recognition algorithms, HMMs have already been successfully applied to more constrained tasks.;Within the framework of hidden Markov modeling, SPHINX uses human knowledge in several ways. Perceptually motivated parameters were incorporated using frame and segment level integration techniques. Also, an optimized set of phones and word pronunciations were derived from human phonetic/lexical knowledge and tuning experiments. The incorporation of knowledge resulted in substantial improvements in recognition accuracy.;It is well known that the same phone in different contexts has different realizations. A good unit of speech must be trainable, yet models context-dependent and word-dependent effects. Two novel units, function-word-dependent phone model and the generalized triphone model, are introduced. These units led to very substantial improvements in performance.;Finally, in order to improve the system given some knowledge of the speaker, two learning algorithms are introduced to modify the system to adapt to an input speaker. The first algorithm is based on speaker cluster selection, and the other involves deleted interpolation of various speaker-independent and speaker-dependent HMM parameters.;SPHINX attained a word accuracy of 96% on the 997-word resource management task. It is much more accurate than any previously reported results on similar tasks. In fact, it is comparable to the best speaker-dependent systems. By using sophisticated modeling techniques to exploit abundant training data, SPHINX has bridged the gap between speaker-independent and speaker-dependent recognition.
Keywords/Search Tags:SPHINX, Recognition, Speech, Speaker, System, Modeling
Related items