Font Size: a A A

Discriminative Training For Large Vocabulary Continuous Speech Recognition

Posted on:2011-12-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:1118360305466636Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In past few decades, discriminative training (DT) has been a very active research area in automatic speech recognition (ASR). Discriminative training of acoustic model has become one of the most important training methods for state-of-the-art speech recogni-tion systems, especially for large vocabulary continuous speech recognition (LVCSR) systems. This thesis focuses on discriminative training of acoustic model and its appli-cation in LVCSR tasks. It also covers another important module in speech recognition, confidence measure (CM).Firstly, this thesis proposes a novel optimization algorithm called constrained line search (CLS) for discriminative training (DT) of Gaussian mixture CDHMM in speech recognition. The CLS method is formulated under a general framework for optimiz-ing any discriminative objective functions including MMI, MCE, MPE/MWE, etc. In this method, discriminative training of HMM is first cast as a constrained optimiza-tion problem, where Kullback-Leibler divergence (KLD) between models is explicitly imposed as a constraint during optimization. Based upon the idea of line search, we show that a simple formula of HMM parameters can be found by constraining the KLD between HMM of two successive iterations in an quadratic form. The proposed CLS method can be applied to optimize all model parameters in Gaussian mixture CDHMMs, including means, covariances, and mixture weights.Secondly, based on the theoretical analysis of original Trust Region (TR) based optimization method we have proposed before, this thesis proposes a new method to construct an auxiliary function for the discriminative training of HMMs in speech recognition. In original Trust Region method, the MMI based discriminative train-ing is treated as a standard trust region problem in optimization theory. And the global optimum of this problem can be obtained efficiently. However, optimizing the auxiliary function cannot guarantee increasing of original objective function. The proposed new auxiliary function still serves as a first-order approximation of the original objective function but more importantly it remains as a lower bound of the original objective function as well. Due to its lower-bound property, the found optimal point is theoret-ically guaranteed to increase the original discriminative objective function. Further-more, the TR method can also be applied to find the globally optimal point of the new auxiliary function. The proposed bounded trust region methods have been investigated on several LVCSR tasks and experimental results show that the bounded TR method based on the new auxiliary function outperforms both the conventional EBW method and the original TR method based on the old auxiliary function.Thirdly, this thesis investigate several practical problems in LVCSR systems, e.g., computing ability and efficiency problems in discriminative training of HMMs in speech recognition, generalization problem in LVCSR system. We propose to build a novel procedure of discriminative training in LVCSR systems, by combining the word graph generated using WFST based decoder and calculating tools from HTK. When conducting discriminative training under this new procedure, not only the efficiency is significantly improved, we also achieve better recognition performance.Lastly, in this thesis, appropriate confidence measures (CMs) are investigated for Mandarin command word recognition, both in the so-called target region and non-target region, respectively. Here the target region refers to the recognized speech part of command word while the non-target region refers to the recognized silence part. It shows that exploiting extra information in the non-target region can effectively comple-ment the traditional CM which usually focus on the target region. Furthermore, when analyzing the non-target region in a more theoretical way, where Bayesian information criterion (BIC) is employed to locate more precise boundary in the non-target region, even more improvement is achieved.
Keywords/Search Tags:Discriminative Training, Acoustic Model, LVCSR, Constrained Line Search, Bounded Trust Region, Confidence Measure
PDF Full Text Request
Related items