Optimal generative and discriminative acoustic model training for speech recognition

Posted on:2010-12-02

Degree:Ph.D

Type:Thesis

University:Ryerson University (Canada)

Candidate:Joshi, Neil

Full Text:PDF

GTID:2448390002478442

Subject:Engineering

Abstract/Summary:

The focus of this dissertation is to derive and demonstrate effective stochastic models for the speech recognition problem. Acoustic modeling for speech recognition typically involves representing the speech process within stochastic models. Modeling this high frequency time series effectively is a fundamental problem.The thesis proposes two such models that are developed to optimize the devised objective function. The first an acoustic model formulated for the speech with noise problem. The second a discriminately trained model consisting of optimal discriminant ML estimators.The first, a combination of recognizers that through a simple system fusion, combines multiple speech processes at the decision level. This is a stochastic modeling method devised to combine a parameterized spectral missing data, MD, theory based and a cepstral based speech process using a coupled hidden variable topology. In using a fused coupled hidden Markov model, HMM, topology, an optimal acoustic model is proposed that is inherently more robust than single process models under noisy conditions. The theoretical capability of this model is tested under both stationary and non stationary noise conditions. Under these test conditions the fused model has greater recognition accuracies than those of single process models.The second, formulated with a methodology that segments the acoustic space appropriately for discriminately trained models that optimize the devised objective function. This acoustic space is modeled with discriminant ML estimators formed with optimal decision boundaries using the large margin, support vector machine, SVM, learning method. These discriminately trained models maximize the entropy of the observation space and thereby are capable to model the speech process without loss. This is demonstrated experimentally with frame level classification error rates that are &sim &le 3%.This dissertation devises an objective function that relates the true speech distribution to its estimate. It is shown that through optimizing this function the speech process time series can be modeled without loss of information.

Keywords/Search Tags:

Speech, Model, Recognition, Optimal, Function

Related items

1	Research Of Speech Recognition Based On Kaldi
2	Research On Speech Recognition Based On Compound Two-way Cyclic Network Under Specific Working Conditions
3	Discriminative Training For Continuous Speech Recognition
4	Research On Anti-Noise Of Speech Recognition Based On Continuous Hidden Markov Model
5	A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition
6	Research And Implementation Of Robust Speech Recognition System
7	Emotional Speech Conversion And Recognition Based On The Three-dimensional PAD Model
8	Research On Auditory Characteristics And Robust Speech Recognition Algorithms
9	Research On The Application Of Speech Recognition Based On Radial Basis Function Neural Networks
10	Research And Implementation Of Speech Recognition Technology Based On Neural Network