Font Size: a A A

Decision-tree probability modeling for HMM speech recognition

Posted on:1995-07-20Degree:Ph.DType:Dissertation
University:Brown UniversityCandidate:Foote, Jonathan TrumbullFull Text:PDF
GTID:1478390014991072Subject:Engineering
Abstract/Summary:
Hidden Markov models (HMMs) are widely regarded as the most robust technique for speaker-independent, connected word recognition. The performance of a given HMM depends largely on the fidelity of the underlying acoustic model, which must estimate the probability of an acoustic observation given a HMM state. Conventional acoustic models include discrete methods where the feature space is discretized, typically by nearest-neighbor vector quantization, and continuous models, where the acoustic probabilities are estimated by Gaussian mixtures or neural-net techniques.;Preliminary results of a tree-based HMM system shows recognition performance approaching continuous models at a computational cost comparable to discrete models. In addition, near-real-time talker-adaptation experiments show promising results.;A new type of acoustic model is presented here, where the underlying feature space is partitioned by a decision tree. Acoustic probabilities may then be estimated either by Baum-Welch methods or Viterbi training, exactly as in conventional discrete methods. Though the tree is used as a vector quantizer, it has significant advantages over conventional discrete models: (1) Trees are extremely fast for classification. Not only is this practical for real-time systems, but the feature space may be quantized with much finer resolution, giving a high-resolution non-parametric model of the underlying pdfs with all the computational advantages of a discrete model. (2) Trees handle both high-dimensional and discrete spaces gracefully. This allows context-dependent acoustic models by concatenating time-adjacent input vectors, and even a "recurrent tree" model by using the delayed output of the tree as an input feature. (3) The relative importance of the individual feature dimensions may be discerned from the tree structure. This allows discrimination between feature types to find those that best represent the underlying speech information. (4) Given a decision tree, new probability estimates may be easily found from Viterbi-labeled data in linear time, rather than the iterative training required by other models. This allows a practical method of speaker adaptation by re-estimating probabilities as new data becomes available.
Keywords/Search Tags:Model, HMM, Tree, Probability
Related items