Ensemble acoustic modeling in Automatic Speech Recognition

Posted on:2012-11-14

Degree:Ph.D

Type:Dissertation

University:University of Missouri - Columbia

Candidate:Chen, Xin

Full Text:PDF

GTID:1458390008995601

Subject:Computer Science

Abstract/Summary:

Combining multiple acoustic models to improve the overall acoustic model quality is a young and promising direction in Automatic Speech Recognition (ASR). Previous works on acoustic modeling of speech signals such as Random Forests (RFs) of Phonetic Decision Trees (PDTs) has produced significant improvements in recognition accuracy. In this dissertation, several new approaches of using data sampling to construct an Ensemble of Acoustic Models (EAM) for speech recognition are proposed. A straightforward method of data sampling is Cross Validation (CV) data partition. In the direction of improving inter-model diversity within an EAM for speaker independent speech recognition, we propose Speaker Clustering (SC) based data sampling and develop two algorithms, including the Likelihood based Speaker Clustering (LSC) and speaker model Distance based Speaker Clustering (DSC). In the direction of improving base model quality as well as inter-model diversity, we further investigate the effects of several successful techniques of single model training in speech recognition on the proposed ensemble acoustic models, including Cross Validation Expectation Maximization (CVEM), Discriminative Training (DT), and Multiple Layer Perceptron (MLP) features. We also propose using an ensemble of Multiple models with Different Mixture Sizes (MDMS) to improve EAM quality. We have evaluated the proposed methods on TIMIT speaker-independent phoneme recognition task as well as on a telemedicine automatic captioning task of speaker-dependent continuous speech recognition. The proposed EAMs have led to significant improvements in recognition accuracy over conventional Hidden Markov Model (HMM) baseline systems, and the integration of ensemble acoustic models with CVEM, DT and MLP has also significantly improved the accuracy performances of CVEM, DT, and MLP based single model systems. We further investigated the largely unstudied factor of inter-model diversity, and proposed several methods to explicit measure inter-model diversity. We demonstrate a positive relation between enlarging inter-model diversity and increasing EAM quality.;HMM-based acoustic models built from data sampling EAM are generally very large, especially when a large number of models or full covariance matrices are used for Gaussian densities. Therefore, compacting the acoustic model to a reasonable size for practical applications while maintaining a reasonable performance is needed. Toward this goal, in this dissertation, we discuss and investigate several distance measures and algorithms for clustering methods. The distance measures include Entropy, KL, Bhattacharyya, Chernoff and their weighted versions. For clustering algorithms, besides the conventional greedy agglomerative clustering, algorithms such as N-Best distance Refinement (NBR), K-step LookAhead (KLA), Breadth-First Search (BFS) are proposed. Experiments on the TIMIT task have shown that in comparison with the original EAM model, the compacted models using the clustering methods can maintain the model accuracy, while the size of the compacted model is largely decreased. Experiments in compacting EAM on a Pashto ASR task have shown that the proposed clustering methods can lead to better quality than the conventional clustering methods.;Unlike the implicit PDT based states tying that has been used in most ASR systems as well as in the recent RF based PDTs, explicit PDT (EPDT) state tying that allows Phoneme data Sharing (PS) is considered for its potential capability in capturing pronunciation variations. The ensemble approach of combining multiple acoustic models is applied to the EPDT, where a combination of explicit PDT and implicit PDT models has been investigated to reduce phone confusions.

Keywords/Search Tags:

Model, Acoustic, Speech, Automatic, PDT, EAM, Quality, Clustering methods

Related items

1	Research On The Comparison And Parallelization Of Discriminative Training Of Acoustic Model
2	Acoustic modeling for automatic speech recognition: Deriving discriminative Gaussian networks
3	Toward more effective acoustic model clustering by more efficient use of data in speech recognition
4	Deep Neural Network Acoustic Modeling For Efficient Speech Synthesis
5	Acoustic Model Of Speech Recognition Based On Lightweight Neural Network And Its Application In Robot
6	Research On Acoustic Modeling In Low Resource Speech Recognition Based On Transfer Learning
7	Clustering wide-contexts and HMM topologies for spontaneous speech recognition
8	Statistical model-based objective measures of speech quality
9	Research On Methods Of Improving Speech Communication Quality Based On Generative Adversarial Network
10	Researching Of The Mongolian Acoustic Model Based On Speech Recognition