Font Size: a A A

Research On Automatic Language Identification And Its Application

Posted on:2017-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:C CaiFull Text:PDF
GTID:2308330485488104Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Chinese economy,communication between people of different languages become more and more frequent. Therefore, it is urgent to break down the barriers to different languages, and the automatic language identification(LID) technology is becoming more and more import. Also, automatic language identification technology has diverse applications in information security, military and security fields and so on.Since the development of language identification technology for Chinese minority languages is slower than the others, and there has not a language recognition system can cover most of chinese minority languages yet. In view of this situation, we attempt to establish a language identification system, including Chinese, Bai language, Tibetan language, Miao language, Naxi language, Uygur language, Yi language and Zhuang language.In oder to do this, the reserch work was carried out from the following aspects:We first studied the feature extraction technologies for LID. The way to preprocess the original speech signal and how to choose appropriate features are the basics of a language recognition system. They are directly related to the performance ceilings of a LID system. Here we summarized a variety of features with high discrimination against minority languages, and introduced their corresponding principles and extraction processes. We also introduced the pre-emphasis and cepstral mean subtraction techniques to the LID system.We then studied the Gaussian mixture model based language identification method. A short introduction of the principle and the estimation of Gaussian mixture model parameters was first presented. Then a language identification system based on GMM was established. The experiments based on a minority languages database show that the system has a dectection cost at 0.2214. We further studied the universal background model, and then established a LID system based on the UBM model.The UBM based System has a dectection cost at 0.2143.Third, support vector machine(SVM) based language identification method was presented. A brief introduction of SVM was given, and a LID system based on it was established. The experiments using the above mentioned database show that the system can achieve 75% recognition rate. However, compared with the performance of GMM based system, this system has a dectection cost at 0.2514.Finally, we compared the effect of different characteristics to GMM based system, GMM-UBM based system and SVM based system. These experiment results will be useful for the following research work.
Keywords/Search Tags:Language identification, minority languages, Gaussian mixture model, support vector machine
PDF Full Text Request
Related items