Research On Automatic Language Identification And Its Application

Posted on:2017-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:C Cai

Full Text:PDF

GTID:2308330485488104

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of Chinese economy,communication between people of different languages become more and more frequent. Therefore, it is urgent to break down the barriers to different languages, and the automatic language identification(LID) technology is becoming more and more import. Also, automatic language identification technology has diverse applications in information security, military and security fields and so on.Since the development of language identification technology for Chinese minority languages is slower than the others, and there has not a language recognition system can cover most of chinese minority languages yet. In view of this situation, we attempt to establish a language identification system, including Chinese, Bai language, Tibetan language, Miao language, Naxi language, Uygur language, Yi language and Zhuang language.In oder to do this, the reserch work was carried out from the following aspects:We first studied the feature extraction technologies for LID. The way to preprocess the original speech signal and how to choose appropriate features are the basics of a language recognition system. They are directly related to the performance ceilings of a LID system. Here we summarized a variety of features with high discrimination against minority languages, and introduced their corresponding principles and extraction processes. We also introduced the pre-emphasis and cepstral mean subtraction techniques to the LID system.We then studied the Gaussian mixture model based language identification method. A short introduction of the principle and the estimation of Gaussian mixture model parameters was first presented. Then a language identification system based on GMM was established. The experiments based on a minority languages database show that the system has a dectection cost at 0.2214. We further studied the universal background model, and then established a LID system based on the UBM model.The UBM based System has a dectection cost at 0.2143.Third, support vector machine(SVM) based language identification method was presented. A brief introduction of SVM was given, and a LID system based on it was established. The experiments using the above mentioned database show that the system can achieve 75% recognition rate. However, compared with the performance of GMM based system, this system has a dectection cost at 0.2514.Finally, we compared the effect of different characteristics to GMM based system, GMM-UBM based system and SVM based system. These experiment results will be useful for the following research work.

Keywords/Search Tags:

Language identification, minority languages, Gaussian mixture model, support vector machine

PDF Full Text Request

Related items

1	Support Vector Machine Based Language Recognition
2	Research On Language IDE NT Ification
3	The Design And Implementation Of Automatic Language Recognition System
4	Acoustic Modeling Approach To Language Identification
5	Research On Robust Processing Technologies In GSV-SVM Based Language Identification System
6	Research On Language Identification Based On Acoustic And Phonology
7	Research And Implementation On Classification Algorithm Of Language Recognition System Based On Anchor Model
8	Video Human Detection Research Based On Gaussian Mixture Model
9	Research On Minority Language Recognition WEKA Platform And Multi-classifier
10	Research On Support Vector Machine For Speaker Recognition