Acoustic modeling for automatic speech recognition: Deriving discriminative Gaussian networks

Posted on:2004-05-22

Degree:Ph.D

Type:Dissertation

University:Stanford University

Candidate:Teunen, Remco

Full Text:PDF

GTID:1468390011974910

Subject:Engineering

Abstract/Summary:

Despite the considerable progress made in recent years, automatic speech recognition is far from being a solved problem. In particular, the accuracy of a speech recognizer degrades dramatically when there is a mismatch between the training and real usage conditions.; State-of-the-art speech recognizers use hidden Markov models (HMMs) and Gaussian mixture models (GMMs) with millions of parameters to model speech. The set of all these models is called the acoustic model set of the speech recognizer. The parameters are trained with speech from thousands of different speakers to capture the variabilities of speech. However, the current acoustic model set over-generalizes and is not able to capture certain constraints in speech that are relevant for recognition. For example, the acoustic model set does not take into account that the gender of a speaker cannot change within an utterance. Furthermore, experiments have shown that the acoustic model set is often not able to take advantage of the vastly increasing amount of training data that is now available with commercial applications.; In this work, a novel technique for deriving discriminative Gaussian networks (GNs) from training data is presented. The Gaussian networks can be viewed as HMM/GMM models that have complex HMM structures, and simple, single Gaussian GMMs. The models are iteratively grown in complexity by splitting HMM states into two states. For each iteration the algorithm splits the states that are expected to give the most significant error rate reduction. The model parameters are discriminatively trained as well, using an improved version of the maximum mutual information (MMI) training algorithm.; Evaluations using the Aurora 2 industry standard benchmark, and a small vocabulary recognition task, show that GN acoustic models are both more accurate and more robust than comparable HMM/GMM acoustic models.

Keywords/Search Tags:

Speech, Acoustic model, Recognition, Gaussian

Related items

1	Research And Implementation Of Algorithms For Increasing Speech Recognition Ratio
2	Research On Acoustic Model Compress For Speech Recognition
3	Researching Of The Mongolian Acoustic Model Based On Speech Recognition
4	Research Of The GMM-HMM Based Acoustic Models
5	Acoustic Modeling For Continuous Speech Recognition
6	Acoustic Model Of Chinese Speech Recognition Based On DNN
7	A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition
8	Acoustic modeling and feature selection for speech recognition
9	Research On Acoustic Modeling For Spontaneous Spoken Speech Recognition
10	Research On Phone Feature Recognition Based On Deep Learning