Font Size: a A A

Discriminative training for speaker adaptation and minimum Bayes risk estimation in large vocabulary speech recognition

Posted on:2006-11-12Degree:Ph.DType:Thesis
University:The Johns Hopkins UniversityCandidate:Doumpiotis, VlasiosFull Text:PDF
GTID:2458390008451002Subject:Engineering
Abstract/Summary:
Stochastic acoustic models are an important component in Automatic Speech Recognition (ASR) systems. The model parameters in Hidden Markov Model (HMM) based speech recognition are normally estimated using Maximum Likelihood Estimation (MLE). If certain conditions hold, including model correctness, then MLE can be shown to be optimal. However, when estimating the parameters of HMM-based speech recognizers, the true data source is not an HMM and therefore other training objective functions, in particular those that involve discriminative training, are of interest. These discriminative training techniques attempt to optimize an information theoretic criterion which is related to the performance of the recognizer.; Our focus in the first part of this work is to develop procedures for the estimation of the Gaussian model parameters and the linear transforms (used for Speaker Adaptive Training) under the Maximum Mutual Information Estimation (MMIE) criterion. The integration of these discriminative linear transforms into MMI estimation of the HMM parameters leads to discriminative speaker adaptive training (DSAT) procedures. Experimental results show that MMIE/DSAT training can yield significant increases in recognition accuracy compared to our best models trained using Maximum Likelihood Estimation (MLE). However by applying MMIE/DSAT training in ASR systems, performance is optimized with respect to the Sentence Error Rate metric that is rarely used in evaluating these systems.; The second part of this thesis investigates how ASR systems can be trained using a task specific evaluation criterion such as the overall risk (Minimum Bayes Risk) over the training data. Minimum Bayes Risk (MBR) training is computationally expensive when applied to large vocabulary continuous speech recognition. A framework for efficient Minimum Bayes risk training is developed based on techniques used in MBR decoding. In particular lattice segmentation techniques are used to derive iterative estimation procedures that minimize empirical risk based on general loss functions such as the Levenshtein distance. Experimental results in one small and two large vocabulary speech recognition tasks, show that lattice segmentation and estimation techniques based on empirical risk minimization can be integrated with discriminative training to yield improved performance.
Keywords/Search Tags:Training, Speech recognition, Estimation, Risk, Large vocabulary, ASR, Speaker, Systems
Related items