Font Size: a A A

Research On Automatic Language Identification Technology Over Telephone Channel

Posted on:2006-03-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:D QuFull Text:PDF
GTID:1118360182960422Subject:Military Intelligence
Abstract/Summary:PDF Full Text Request
Language identification is one of important aspects in speech recognition technologies, and has an extensive application foreground.A language identification system includes three parts: feature extraction, modeling and judgment rule. Using the OGI multi-lingual telephone speech corpus, this dissertation does some researches on speaker-independent language identification technologies and methods, and proposes some innovational ideas and applications in feature extraction, modeling and front-end and post-end processing, as well as develops a language identification system.In modeling, this dissertation mainly studies the modeling method based statistic learning theories, and put forward Gaussian Mixture Bigram Model-Universal Background Bigram Model(GMBM-UBBM). which is the extending model of Gaussian Mixture Model-Universal Background Model(GMM-UBM). In the dissertation, the language identification systems based on both GMM-UBM and GMBM-UBBM is implemented. Not only does the new language identification model GMBM-UBBM remains the capability of discriminating among languages, but also has the new advantage of bigram contextual information of GMBM. By combining the two models, the problem of statistical independence of adjacent vectors in GMM-UBM is solved. The new GMBM-UBBM preserves the advantages of two models and overcomes the individual shortcomings.As the training criterion is concerned, the dissertation studies the discriminative trainingschemes in modeling, and analyses the present main discriminative training criterions——Maximal Mutual Information(MMI) and Minimal Classification Error Rate(MCE), as well as the language identification systems based on the two criterions are proposed and constructed. The two algorithms improve the discriminating ability among models by incorporatinginter-class information. The typical algorithm——Generalized Probabilistic Descent(GPD) isused in the implementation. This paper implements the two discriminative training algorithms, and it is also the first time to utilize the discriminative training algorithm of GMM to identify the languages. After formulating the detailed theoretical analysis and the implementation process, a large number of experiments results show discriminative training of GMM is very effective in improving the language identification accuracy.In feature extraction, a new feature extraction algorithm base on discriminative training of GMM called discriminative feature extraction (DFE) is applied in language identification. Classical feature extraction model is independent of the design of the classifiers. Such methodintroduces discriminative training mechanics into feature extraction process. In this paper, we attempt to design the auditory-based filters (AF) used in the feature extraction algorithm discriminatively in a data-driven approach so as to minimize the final recognition errors. The experimental results show that the discriminative feature extraction is superior to the classical MFCC.At the post-processing end, as a method to improve the accuracy of the LID system, multi-classifiers fusion is more and more adopted. In this paper, we study the fusion at the decision level from two aspects. Firstly, four fusion styles are investigated, namely equal linear weighting (ELW), log equal linear weighting (LELW), universal linear weighting(ULW) and multi-classifier competing(MC). Secondly, the linear optimal combinations are studies. Based on the fusion criterions of CFM, MSE and CE, a new fusion criterion of MCE is proposed in the paper. The results show the new proposed MCE criterion improves the accuracy rate compared to ELW and ULW, and has the similar results as the other three criterions.At the front-end, we introduce a brand-new physical idea, Supermagementic clustering (SPC), into speaker clustering, which is an important technique of language identification system. Super-paramagnetic clustering algorithm formulates the problem of data clustering as that of measuring equilibrium properties in an inhomogeneous Potts model. Namely, in some temperature range the system exhibits a super- paramagnetic phase and the categories can be identified using the correlations between data points. Experimental results show, the super-paramagnetic clustering algorithm can obtain perfect speaker rough classification. And what is mort important is that SPC algorithm will not impose a partition on the data when there are no natural classes present in it, which is also one of the advantages of the SPC.
Keywords/Search Tags:Language identification, Gaussian mixture bigram model-universal background bigram model(GMBM-UBBM), minimum classification error criterion(MCE), maximal mutual information criterion(MMI), discriminative feature extraction(DFE), decision-level fusion
PDF Full Text Request
Related items