Font Size: a A A

Acoustic Modeling Approach To Language Identification

Posted on:2012-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2178330338491943Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the trend of globalization in our life and the rapid development of network, communication without spoken language barriers becomes more and more important. Therefore, techniques of language identification (LID) appeal to be increasing by the day. Meanwhile, language identification has a widely application in the military, national security and various information industries. Recently, the application values and prospect in practice of language identification start to attract more and more attention.For the acoustic modeling of language identification, there are two typical modeling approaches in the state-of-art: Generative modeling based on the Gaussian Mixture Model (GMM) and discriminative modeling based on the Support Vector Machine (SVM). Both of these two methods can easily utilize the same acoustic low-level spectrum features to train the language models and achieve some good properties: the excellent robustness, high efficiency. Meanwhile, these two methods have large complementary to each other. In this context, the thesis mainly focuses on the acoustic modeling to language identification. It presents our works and innovations in front-end/back-end processing, language model training, open-set identification and algorithm optimization to build an efficient language identification system with high performance.Firstly, our works on the front-end processing techniques was addressed. In order to exploit the most representative language information from the original speech, some distortions which may lead to the performance reduction in language identification must be eliminated at first: such as the gender and age of a speaker, health status, channel effects and environmental noise. In this paper, the Vocal Tract Length Normalization (VTLN), Latent Factor Analysis in feature domain (fLFA), noise and non-speech signals removing were proposed to alleviate the distortions come of the language-unrelated signals.Secondly, this thesis provides a new model training method to improve the performance of language identification systems. We first built a language identification system using the discriminative training method based on the Maximum Mutual Information (MMI) criterion, and then proposed a new modeling approach called"Refined Modeling"to exploit more detailed information in feature space and to improve the performance of LID system. Unlike the traditional Maximum Likelihood estimation (MLE) which is trained to focus on the parameter adjustment of the language models, and to reflect the probability distribution of the training data, the MMI training pays more attention to the classifier margin between different languages. In addition, in practical applications, it is difficult to acquire some label information, such as channel types, dialect and gender information, et al. However, our"Refined Modeling"can be more accurate and achieve better generalization ability of the distribution of the original data without those labels.Third, this thesis presents our works on the back-end processing and optimization algorithm. Our works on the back-end processing included the system fusion, give decisions and open-set detection. The LDA and GMM models were implemented to greatly improve the performance of LID system. For the algorithm optimization, the computing complexity of LID systems was significantly reduced by applying the TopN strategies and the OpenMP multi-threaded programming. Finally, we integrated all of our language identification techniques to build a demonstration system based on MFC and Google Earth.
Keywords/Search Tags:language identification, Gaussian Mixture Model, Support Vector Machine, MMI, Refined modeling, VTLN, fLFA
PDF Full Text Request
Related items