Font Size: a A A

Balancing model resolution and generalizability in large-vocabulary continuous speech recognition

Posted on:2000-06-15Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Luo, XiaoqiangFull Text:PDF
GTID:1468390014965482Subject:Engineering
Abstract/Summary:
We study the problem of balancing model resolution and generalizability in large vocabulary continuous speech recognition (LVCSR). Parameter tying (i.e, using a single set of parameters for different model constructs) is often used in LVCSR systems to balance the two. However, one consequence of tying is that the differences among tied constructs are ignored. Parameter tying can be alternatively viewed as reciprocal data sharing in that a tied construct uses data associated with all others in its tied-class. To capture the fine difference among tied HMM constructs, we propose to use non-reciprocal data sharing (NRDS) when estimating HMM parameters. In particular, when estimating Gaussian parameters for an HMM state, contributions from other acoustically similar HMM states will be weighted, thus allowing different statistics to govern different states. Data sharing weights are optimized using cross-validation. It can be shown that the objective function for cross-validation is a sum of rational functions and can be efficiently optimized by the growth-transform, an iterative algorithm that finds a local maximum for this special constrained optimization problem. Our results on Switchboard (Switchboard, or simply SWBD, is a telephone conversational speech corpus which is well-known in the speech community for its difficulty) show that NRDS reduces the word error rate (WER) by 0.9% compared with a state-of-the-art baseline system using HMM state-tying.;To explore the idea that only partial contribution from others is used in estimating model parameters of the construct we are interested in, we also study a scheme of probabilistic classification of HMM states. We will show that this formulation can be treated within the Expectation-Maximization (EM) framework and that all the model parameters can be updated using the EM algorithm. When applied to a SWBD test set used in the summer workshop WS971, the proposed method achieves a WER 35.1%, which is the best result on this test set as of the time the dissertation is being written.;To demonstrate the applicability of the idea of non-reciprocal data-sharing in language modeling, we propose a data-sharing trigram language model. In this model, when an event is not seen or has low counts in the training text, the model uses a set of "similar" histories to compute its probability instead of simply resorting to bigram or unigram probabilities. Our preliminary study shows that perplexity can be reduced significantly for events predicted by the proposed model components.;1There is an annual summer workshop since 1993, where researchers from leading universities and industrial laboratories get together to deal with aspects of the SWBD task for several weeks.
Keywords/Search Tags:Model, Speech, HMM, SWBD
Related items