Font Size: a A A

Research On Acoustic Modeling For Spontaneous Spoken Speech Recognition

Posted on:2015-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H QiFull Text:PDF
GTID:1228330422493323Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Acoustic modeling is one of the key problems in the field of speech recognition. Theaccuracy of acoustic model has a direct impact on the performance of speech recognition.How to create an accurate acoustic model has attracted much more attention in the recentlyresearch. The thesis contributes to the accuracy of acoustic model and performanceimprovement for continuous speech recognition system. Some related problems ofestimation of triphones’ model parameters before state clustering during the training ofacoustic model and acoustic model adaptation are mainly investigated.First of all, in order to improve the accuracy of decision tree based state tying inMandarin continuous speech recognition, the refining of triphone acoustic model beforestate clustering is studied. The construction of decision tree has relation to the accuracy ofthe used triphone acoustic models. A lot of sparse triphones exist in the training data. Theproblem of sparse data exists for the training of triphone models before state clustering.Maximum a Posteriori (MAP) criterion is used for the estimation of parameters of thesetriphones’ models. Besides, the MAP estimation has a critical demanding for the accuracyof the initial parameters of acoustic models. The similarity between toned triphones withdifferent tone is better than that between toned triphones with the same center phone. So weinitialize the toned triphone model using their corresponding parameters of tonelesstriphone model to improve the accuracy of the initial parameters of toned triphone model.These methods give improved performance.Secondly, the discriminative maximum a posteriori adaptation is investigated.Minimum phone error MAP (MPE-MAP) carries out discriminative adaptation byincorporating prior information into the estimation of model parameters. The accuracy ofhyperparameters of the prior distribution has a powerful influence on the recognitionperformance. We proposed MMI-MAP prior based MPE-MAP (MPE-MMI-MAP) andH-criterion MAP (H-MAP) prior based MPE-MAP (MPE-H-MAP). MMI-MAP andH-MAP are used for the estimation of the center of prior distribution for MPE-MMI-MAPand MPE-H-MAP respectively. This can refine the model parameters by more accuratehyperparameters and obtain better performance.Thirdly, the discriminative linear transform adaptation is studied. The use ofI-smoothing is useful in discriminative linear transform adaptation. A similar effect is achieved by adding a log prior distribution of the parameter set to the objective function. Asmoothing method based on the prior of the mean parameters is proposed for discriminativelinear transform. If the ML estimation is used to define the hyperparameter of the priormean distribution, it is the same as I-smoothing. In the context of adapting a hiddenMarkov model (HMM) set, the ML statistics accumulated from the data may be nonrobustsince there are not enough data to estimate ML Gaussians parameters. We proposed the useof MAP estimation to define the hyperparameter to improve the performance ofdiscriminative linear transform when the amount of adaptation data is limited. Moreover,we designed a new objective function for the estimation of linear transformation matrix,which leads to the discriminative MAP linear regression algorithm. Experimental resultsdemonstrate performance improvement could be achieved with limited adaptation data. Andit has the same performance with the discriminative linear transform when large amount ofdata is available.Finally, the linear projection (LP) method is investigated. LP function is an extensionof linear regression (LR). Multiple sets of initial acoustic models are used for thetransformation. LP based on transformation matrix algorithm is proposed. This method takespeaker adapted (SA) models as initial models and use transformation matrix to describethe informtiaon of special speaker. Maximum likelihood method is used to select themodels with important information as the initial models. It reduces the scale of parametersand results in a fast adaptation method.
Keywords/Search Tags:continuous speech recognition, acoustic model, speaker adaptation, discriminative training, discriminative linear transform
PDF Full Text Request
Related items