Font Size: a A A

Research On Discriminative Training In Speech Recognition

Posted on:2010-04-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H WuFull Text:PDF
GTID:1118360308962200Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Acoustic model training is an important composition in speech recognition and was paid much more attention in the recently research. Maximum Likelihood Estimation (MLE) is a traditional method for training acoustic models in speech recognition. But this method does not consider discriminative relation between models, some models are apt to obscure each other. In order to raise the differentiation degree between models, discriminative training criteria are proposed. This thesis will make research foucs on discriminative training based on our large vocabulary continuous speech recognition (LVCSR) system. The research and innovations are described in details as follows:1. Make a deep research on acoustic model training methodsIn this paper, two classes of training methods are used. For the generative training, the MLE algorithm was realized. For the discriminative training, the Maximum Mutual Information Estimation (MMIE) and Minmum Phone Error (MPE) algorithms were discussed. We make research foucs on MPE discriminative training and established the training platforms based on the HMM Tool Kit (HTK) for these methods, respectively.2. Proposed a new method to enhance the discriminability of the generative modelAs for HMM based acoustic models training, the popular estimation method is the Baum-Welch algorithm based on MLE. This method does not consider discriminative relation between acoustic models, so some models would obscure each other. But it has several advantages, e.g. straightforward Expectation Maximization (EM) methods for handling missing data. So we proposed a new model combination method to enhance the discriminability of the generative model, it extracted the discriminative parameter from the generative model and generated the new model based on multi-model combination. The weight for combining was determined by the ratio of the inter-variance to the intra-variance of the classes, higher the ratio was, greater the weigh would be, and more discriminative the model was. Experiments demonstrate it could get better performance than the generative model on the large scale database.3. Proposed a multi-model combination method for discriminative trainingThe discriminative training methods pay more attention on the competing models and can get better performance in the whole model space, but MLE has its own merits and it gives a more accuracy characterization on the model self. In the whole space, the discriminative training will be excelled than the MLE, but for the individual models, it might be given a poor result than the MLE. So they have their own advantages and drawbacks and it will give better results if we combined them.In this paper we proposed servel methods to combine the generative model and the discriminative model. But because of the better discriminative properity of the discriminative model, firstly, we propose an algorithm for confusion cluster before the parameter calculation, it can avoid the influence that the current model be induced by the farther models and then the variance can not accurate represent its discriminative ability. The parameter that calculated on the confusion cluster will be more representative than calculated on the whole space. Then we do the model combination between the generative model and the discriminative model, and between the discriminative model and the discriminative model, too.We also proposed a new model combination method based on the model confusion. For the single mixture models it weighted the models of MPE and MLE based on the confusion of them. For the multi mixtures models, a model selection method was proposed. Experiments demonstrate these model combination methods could get better performance on the large scale database.4. Proposed a method for dynamic Gaussian mixture splittingThe use of mixtures of Gaussian densities in HMM output distribution is popular in speech recognition. Gaussian mixture distributions are often seen as the current best way of approximating the "true" underlying distributions. However, efficient use of Gaussian mixture distributions is not a solved problem because of two conflicting requirements. On the one hand, the acoustic model which is consisted by a set of Gaussian components should have enough description capability. The number of such components should be large enough both to provide adequate coverage of the relevant feature space and also to model the fine structure of the underlying distribution. On the other hand, the number of such components should be small to ensure that there are enough training data to robustly estimate both the mixture weights and the parameters of the Gaussian densities.In this paper we proposed several discriminative measures for dynamic Gaussian mixture splitting based on MPE training. The aim was to provide a flexible mechanism to only increase the number of mixture for the selected states which could enhance the whole acoustic model. So a more efficient model could be established and estimated. The splitting measure was calculated from the statistic accumulation on the phone accuracy and the statistic information both on the numerator and denominator elements of the lattice. Thess methods do not need to add too much extra calculation and can be easily realized on the acoustic model training processing. Experiments showed that this method could achieve better performance with less number of mixtures than full mixture splitting.
Keywords/Search Tags:continous speech recognition, discriminative training, minmum phone error, maximum likelihood estimation, dynamic mixture splitting
PDF Full Text Request
Related items