Font Size: a A A

Modeling articulatory dynamics using HMM techniques for automatic speech recognition

Posted on:1995-04-27Degree:Ph.DType:Dissertation
University:University of Waterloo (Canada)Candidate:Erler, Kevin JFull Text:PDF
GTID:1478390014989430Subject:Engineering
Abstract/Summary:
State-of-the-art speech recognition is accomplished by using stochastic models (Hidden Markov Models or HMMs) to represent small, non-overlapping segments of speech, often referred to as "phonemes". In these conventional HMM speech recognizers, the control strategy does not draw on the underlying structure of speech, but rather models the acoustics as a set of disjoint "segmental" units. Such a strategy does not accommodate the acoustic influence that phonemes have on neighboring phonemes, nor does it attach any meaning to the interval states of the model.;In this work, an alternative HMM control strategy is presented which draws on the idea that the production of speech is a process governed by the mechanical motion of a finite set of relatively slow moving articulators. The Articulatory Feature Model is defined as an HMM in which each internal state of the model represents one point in the (quantized) articulatory space (that is: one possible configuration of the articulatory system). Rather than modeling disjoint acoustic segments, this model represents the acoustic patterns associated with the various articulatory configurations of the speech production system. Instead of a set of small disjoint models, this scheme represents the entire vocabulary with a single, large HMM. Individual vocabulary items are specified as a sequence of target articulatory configurations. The context dependency of phonemes is now explicitly accommodated by those states representing articulatory configurations visited between articulatory targets. The internal model states now have potential real-world interpretation due to their correlation with the physical state of the production system. This allows the incorporation of linguistic and physiological knowledge to restrict the model evolution and improve performance. The development of the quantized articulatory space, target articulatory feature sequences, and feature evolution constraints for a large-vocabulary speech recognition system are presented. Recognition results are presented for both small and large vocabulary tasks, showing that the articulatory feature scheme is competitive with the traditional phoneme model (offering roughly 10% decrease in error rate over the phoneme model). Analysis of the model's behaviour indicates how model designers are able to capitalize on the physical interpretation of internal model states.
Keywords/Search Tags:Model, HMM, Speech, Articulatory, Recognition, States
Related items