Font Size: a A A

Acoustic modeling for automatic speech recognition: Deriving discriminative Gaussian networks

Posted on:2004-05-22Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Teunen, RemcoFull Text:PDF
GTID:1468390011974910Subject:Engineering
Abstract/Summary:
Despite the considerable progress made in recent years, automatic speech recognition is far from being a solved problem. In particular, the accuracy of a speech recognizer degrades dramatically when there is a mismatch between the training and real usage conditions.; State-of-the-art speech recognizers use hidden Markov models (HMMs) and Gaussian mixture models (GMMs) with millions of parameters to model speech. The set of all these models is called the acoustic model set of the speech recognizer. The parameters are trained with speech from thousands of different speakers to capture the variabilities of speech. However, the current acoustic model set over-generalizes and is not able to capture certain constraints in speech that are relevant for recognition. For example, the acoustic model set does not take into account that the gender of a speaker cannot change within an utterance. Furthermore, experiments have shown that the acoustic model set is often not able to take advantage of the vastly increasing amount of training data that is now available with commercial applications.; In this work, a novel technique for deriving discriminative Gaussian networks (GNs) from training data is presented. The Gaussian networks can be viewed as HMM/GMM models that have complex HMM structures, and simple, single Gaussian GMMs. The models are iteratively grown in complexity by splitting HMM states into two states. For each iteration the algorithm splits the states that are expected to give the most significant error rate reduction. The model parameters are discriminatively trained as well, using an improved version of the maximum mutual information (MMI) training algorithm.; Evaluations using the Aurora 2 industry standard benchmark, and a small vocabulary recognition task, show that GN acoustic models are both more accurate and more robust than comparable HMM/GMM acoustic models.
Keywords/Search Tags:Speech, Acoustic model, Recognition, Gaussian
Related items