Font Size: a A A

Towards High Performance And Parsimonious Acoustic Modeling In Speech Recognition

Posted on:2007-02-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:X B LiFull Text:PDF
GTID:1118360185451413Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Current state-of-the-art, continuous density HMM-based large vocabulary speech recognition system delivers a fairly decent recognition performance in a benign environment but usually at a price of large memory and high computation complexities. In this thesis we explore the possibilities to obtain parsimonious acoustic model while maintaining the same performance as the complex model. They are explored in: 1) training algorithm; 2) dimensionality reduction; 3) model compression. Novel and efficient algorithms are proposed.In model training, the N-best based minimum classification error training is developed. Experimental results show that a high performance, parsimonious model can be obtained. This MCE is then extended to optimize subspace distribution clustering HMM. Experimental results show that performance degradation resulted from converting CDHMM to SDCHMM can be recovered and 15-80% word error rate reduction is obtained.In dimensionality reduction, we jointly optimize feature reduction transformation and the model parameters with MCE criterion. A new feature extraction, which uses LDA to perform feature decorrelation and dimensionality reduction, is proposed and developed into a discriminative feature extraction framework. A 14-dimension features gives almost the same performance as the 39-dimension MFCC features.In model compression, we found that different states contribute non-uniformly to recognition. Likelihood, Kullback-Leibler divergence, and state divergence are used to allocate Gaussian components to HMM states. The state divergence-based approach considers the discrimination of states. A greedy search is proposed to optimize Gaussian component allocation. Compared with Bayesian information criterion-based determination, the proposed approaches show improved performance.Also, we study feature-level model compression. Optimal clustering and non-uniform allocation of Gaussian kernels in the scalar feature dimension are proposed. Symmetric KLD is adopted to cluster Gaussian kernel, and KLD-based and likelihood-based non-uniform allocation are developed by using a Greedy search. Our non-uniform allocation gives better performance than uniform allocation, especially at larger compression ratios; likelihood-based allocation also outperforms KLD-based one. With almost negligible recognition performance degradation, the original HMMs can be compressed to 15% of its original size, which needs about 1% of the original multiplication/division operations. For the isolated-word recognition task tested, the multiplication/division operations can be further reduced to 0.05%.
Keywords/Search Tags:Parsimonious
PDF Full Text Request
Related items