Font Size: a A A

A Study Of Active Learning For Acoustic Modeling In Speech Recognition

Posted on:2012-02-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:1118330371460291Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Acoustic model for speech recognition is always trained in supervised way. Recently, along with the rapid development of modem media and fast rise of Internet, it becomes very easy to collect a large number of speech data. However, these raw samples are required for additional time-consuming and costly manual annotations. Thus, in order to train high-performance acoustic model with significantly reduced transcription effort, active learning (AL) is adopted by actively selecting the most informative samples for annotating and acoustic modeling, and it has become a very popular research topic. This paper focuses on active learning for acoustic modeling, and its main contributions and innovations are described as follows:1. A KLD based method for initial set selectionInitial set has great influence on the convergence rate of active learning, but it is usually randomly selected. We propose a KLD based method for initial set selection, the distributions of total set and several initial sets are firstly modeled by GMMs, then the KLD between the distributions of each initial set and total set is calculated, and finally the initial set with the lowest KLD is selected for active learning. The experiments show that the initial set selected by the proposed method can achieve satisfying convergence rate.2. Sample evaluation algorithms based on different confidence measuresWe firstly propose a sample evaluation algorithm based on multi-level confusion networks. Word posterior probability based on word confusion network has been proven effective for active learning, but the structures of word in Chinese are flexible, and the word posterior probability may be inaccurate due to the boundary ambiguity while generating the confusion network. We propose a unified framework to generate confusion networks of multiple levels, posterior probabilities obtained from multi-level confusion networks are respectively adopted to evaluate the unlabeled samples. Moreover, we propose a sample evaluation algorithm based on the combination of several predictors. Samples are always selected based on single predictor such as posterior probability, which cannot overall evaluate the samples. In this algorithm, the predictors of the unlabeled samples are firstly built, then the training set for support vector machine (SVM) is labeled using the recognition result evaluation method based on mixed words, and then the posterior probability output by the trained SVM is used for sample evaluation. The experiments show that these two algorithms above are effective.3. A sample confidence measure based on latent topic similariySince the performance of speech recognition has been determined by the ability of ambiguity resolution and error correction, the confidence measure extracted from high-level information sources becomes very important. We propose a sample confidence measure based on latent topic similarity, each word and context topic distributions in the sample recognition result are firstly obtained using the latent Dirichlet allocation (LDA) model, and then the proposed word confidence measure is extracted by determining the similarities between these two topic distributions. The experiments show that the proposed confidence measure has a good information complementary effect combined with confidence measures from decoding information.4. A selective training algorithm for acoustic modelingConsidering that both confidence-based active learning and semi-supervised learning ignore the confidences of the words,characters or syllables of the selected samples each iteration, we propose a selective training algorithm, try to select several units(the unit could be word,character or syllable) of the unlabeled samples based on some confidence measure for acoustic modeling and apply this algorithm to semi-supervised learning. Some initial experiments show that it is effective to combine semi-supervised learning with selective training algorithm when the selection ratio is small.
Keywords/Search Tags:active learning, acoustic model, confidence measure, sample evaluation
PDF Full Text Request
Related items