Font Size: a A A

Research On Subspace Based Acoustic Modeling And Adaptation Techniques

Posted on:2014-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:W L ZhangFull Text:PDF
GTID:1108330482979103Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The speech signal lies in a high dimensional space. Due to different phone contexts, speakers and speaking environments, it contains a lot of variability. How to obtain an accurate acoustic model using limited training data and how to adapt it to match testing data, are two hot topics in continuous speech recognition. Subspace methods find the intrinsic low dimensional manifold of high dimensional data, yielding low complexity and robust model. This dissertation is focused on developing better acoustic model and adaptation methods using various subspace methods. The main content can be divided into three parts:Part I. Background knowledge and the start-of-art. The basic principal of speech recognition using the hidden Markov model and Gaussian mixture model (HMM-GMM) is presented. The state-of-art and recent advancements of acoustic modeling and adaptation methods are summarized. The pros and cons of each method are discussed in detail.Part II. Acoustic modeling using subspace methods. In this part, a low-dimensional manifold of the acoustic features is established for building of better acoustic model. The main contributions are as following.1. A new acoustic modeling method using a mixture of factor analyzers (MFA) is presented. With a set of locally linear factor models, MFA can approximate the speech manifold in the feature space. Using the compressive sensing and Bayesian principal, each context-dependent state model can be derived. By sharing a common manifold, the MFA-based acoustic model contains fewer parameters, leading to more robustness. Speech recognition experiments on both the RM and WSJ corpora show that, the MFA-based acoustic model is better than the traditional HMM-GMM and the SGMM acoustic models.2. Using the boosted maximum mutual information (MMI) criterion, discriminative training of the MFA-based acoustic model is proposed. Updating formulas for various parameters are derived with the help of weak-sense auxiliary functions. With a carefully designed Gaussian prior, MAP estimation for each parameter can also be obtained. Experimental results show that the performance of MFA-based acoustic model using discriminative training method is further improved.Part III. Speaker adaptation using subspace methods. Tow subspace-based speaker adaptation methods are derived using the inter-speaker and inter-phone correlation information of the model parameters respectively. Regularization techniques and Bayesian methods are applied to further improve performance. The main contributions are as follows.1. Using the inter-speaker correlation information, a compressive sensing speaker adaptation method is introduced based on speaker subspace modeling. A redundant speaker dictionary is constructed at training time. During adaptation, a sparse representation of the unknown speaker model is obtained. Two algorithms, namely the matching pursuit and L1 regularization algorithms, are adapted to find the best representation. The new method combines the advantages of the conventional eigenvoice (EV) and reference speaker weighting (RSW) methods. Experimental results show that when the adaptation data is limited the compressive sensing method outperforms conventional methods.2. Using the inter-phone correlation information, a new speaker adaptation method is put forward based on the phone variation subspace modeling. The parameter variations of a speaker-dependent model are assumed to be located in a phone variation subspace. A set of speaker-independent coordinate vectors are obtained at training time. During adaptation, a basis matrix of the phone-variation-subspace is estimated for each unknown speaker. The ML estimates of both the coordinate vectors and the basis matrix are derived, and a new speaker adaptive training method is presented. Experimental results show that the new method outperforms various existing methods when the adaptation data is sufficient. But it suffers from over-fitting when the adaptation data is insufficient. Various regularization techniques are investigated to further improve its robustness. Six regularization terms, namely the L1 norm, the L2 norm, the elastic net, the nuclear norm, the group sparse constraint and the sparse-group LASSO are discussed, and their effects on the estimation of the basis matrix are compared. An effective algorithm for solving all these regularized optimization problems is given. Experimental results show that the regularization methods can improve robustness, and the sparse-group LASSO method performs best.4. Combining the speaker subspace and phone variation subspace, a hierarchical Bayesian adaptation method is proposed. Speaker subspace of the speaker-dependent basis matrix is introduced after the phone-variation subspace is obtained. Using probabilistic PCA, a hierarchical prior of the model parameters is obtained and a hierarchical Bayesian adaptation method can be derived. MAP estimation of each parameter layer is presented and an efficient online adaptation scheme is given. With a set of simplified Bayesian estimation formulas, the advantages of all the previous methods can be combined in a consistent way. The new method can handle varying amounts of adaptation data automatically and efficiently. Experimental results show that the hierarchical Bayesian adaptation method obtains good performance under all testing conditions.
Keywords/Search Tags:Continuous Speech Recognition, Acoustic Model, Speaker Adaptation, Subspace Method, Mixture of Factor Analyzers, Regularization Method, Bayesian Estimation
PDF Full Text Request
Related items