Font Size: a A A

Discriminative Methodologies For Tone Problem Solving In Mandarin Speech Recognition

Posted on:2009-07-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:H HuangFull Text:PDF
GTID:1118360275954640Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Chinese is a tonal language and tones are of fundamental importance to Mandarin speech recognition.Tones can be as important as phonemes when contextual information is limited or missing.Utilization of tone information to improve performance in Mandarin speech recognition has been widely studied in recent research.Significant improvements have been achieved on various scale speech recognition tasks in both clean and noisy en-vironment.In recent years,discriminative machine learning method has been one of the hottest direction in pattern recognition and especially in automatic speech recognition research. Several model parameter estimation and feature extraction methods based on discriminative principles have shown to be successful in both classification and continuous speech recognition tasks.This dissertation aims at solving tone problems which are unique in Mandarin speech recognition,and hence improving the performance of large vocabulary speech recognition system,by taking advantage of the recently proposed discriminative training criteria,models and methods.An systematic overview of the discriminative training criteria,models and correspondingly derived discriminative techniques is provided.Several discriminative ap-proaches to tone problem solving in Mandarin speech recognition are proposed,which can be summarized as follows:Traditional tone modeling based on hidden Markov models is firstly investigated from a new,discriminative training perspective.To improve tone recognition accuracy,discriminative training in both the model space and the feature space is proposed.In the model space, the model parameters are trained by using an objective function termed as minimum tone error,which is a smooth approximation of tone recognition accuracy.In the feature space, based on the fact that Mandarin tones are greatly influenced by the context tones,a tonal feature extraction method for HMM based tone modeling is inroduced.The method uses linear transforms to project F0(fundamental frequency) features of neighboring syllables as compensations,and adds them to original F0 features of current syllable.The trans-forms are discriminatively trained according to the same objective function.Experiments show the new tonal features achieve significant tone recognition improvement,compared with baseline using maximum likelihood trained HMM on normal F0 features.The overall discriminative training on the new features introduces further improvement.It is also found the DTFE method brings additional improvements to traditional F0 normalization technique.Conditional random fields(CRFs) should be one of the most successfully applied mathematical models in the research field of natural language processing.Tone modeling using the extension of CRFs,hidden conditional random fields(HCRFs) is explored.To better capture the F0 contour,a generalized dynamic feature is introduced.Experimental results on tone recognition have shown the HCRFs based tone model outperform both the maximum likelihood and discriminatively trained HMM tone models when using the same model structure and observations.The generalized dynamic features introduces consistent gain over the normal dynamic features.It has been pointed out that a key advantage of CRFs or HCRFs is their great flexibility to include a wide variety of arbitrary,non-independent features of the input.In Mandarin speech recognition,unlike the spectral features,no F0 is observed in unvoiced region.The discontinuity between voiced and unvoiced segments has traditionally made tone modeling difficult.Thus the model of HCRFs is more suitable for dealing with this special phenomenon.A preliminary evaluation of HCRFs for embedded tone modeling in Mandarin speech recognition is presented.Experimental results on tonal syllable classification tasks have shown HCRFs on discontinuous F0 features is better than using smooth F0 feature.The large margin methods have attracted a lot of research attentions in the field of machine learning.The fact that it is the margin in classification rather than the raw training error that matters has become a key tool in recent years when dealing with discriminative classifiers.We build segmental feature based tone classifier on Gaussian mixture model.A discriminative objective function termed as large margin criterion is adopted to train Gaus-sian mixture parameters.A novel model parameter updating equation using the weak-sense auxiliary function is formulated to obtain an efficient iterative training approach of the Gaussian parameters.Linear discriminant analysis feature reduction algorithm is applied to extraction critical segmental feature of the tones.Experimental results on tone recog-nition tasks have shown the margin based discriminative criterion is better than empirical risk based objective function.The proposed Extended Baum Welch(EBW) like updating algorithm have achieve a comparable performance when using only several iterations.The GMMs trained on LDA derived features are better than the previously proposed overlapped di-tone Gaussian mixture models.When integrating explicitly trained tone models into lattice based rescoring,a discriminative framework of tone model integration is proposed.The method is to use model dependent weights to scale probabilities from various models:the HMM based on spectral features and tone models based on F0 related tonal features.The weights are discriminatively trained by the minimum phone error(MPE) criterion and update equation of model weights based on the EBW algorithm is derived.Various schemes of model weight combination such as tonal syllable dependent,final model dependent,model combination dependent and word dependent are evaluated and a smoothing technique is introduced to make training robust to over fitting.The proposed method is evaluated on tonal syllable output and character output speech recognition tasks.Experiments results show the proposed method has obtained significant relative error reduction than global weight on the two tasks due to a better interpolation of the given models.
Keywords/Search Tags:Mandarin speech recogntion, tone modeling, discriminative training, discriminative feature extraction, hidden conditional random fields, large margin, minimum phone error
PDF Full Text Request
Related items