Telephone Voice-based Minority Language Recognition Research

Posted on:2012-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:L B Zuo

Full Text:PDF

GTID:2218330338955721

Subject:Detection Technology and Automation

Abstract/Summary:

PDF Full Text Request

According to statistics, there are 5651 languages in the world. With the communication between the languages becomes more and more important, how to make computer identify the different languages have become the people's urgent needs. Language identification is the process of determining the language of a spoken utterance. In essence, it is an aspect of speech recognition. Language identification has been widely used in multi-linguistic information services and security fields.There are three types of language recognition system:phonotactic approach system in current, acoustic approach system and their combined systems. Acoustic approach system does not require manual tagging corpus, and it has a good portable, so it has been widely used.This paper is focus on the test-independent language identification method, using GMM-UBM to build the language identification system, and explores methods to improve the recognition rate. The main works are as follows:(1) We design a minority-oriented language identification of telephone speech corpus, which consists of spontaneous utterances in 9 minority-oriented languages and Mandarin. The utterances are produced by 25 male and 25 female, in each language over real telephone lines. Then we do some preliminary collating before we use them.(2) A language identification system of minority language based on GMM-UBM model is built in this paper. And we design two language identification experiments, in which MFCC and SDC feature parameters are respectively used. In the experiments, a new method of double threshold for voice activity detection is used to effectively remove noise and extract useful voice frames. Then we extract MFCC and SDC feature parameters, and train UBM model and the GMM model of 6 languages.(3) Utterances with different durations and Chinese loan words of six minority languages are selected to test. We analyze each language identification rate and the results with different duration testing data and different feature parameters, and then we give some explanations of error identification in terms of phonetics. We also analyze the impact of Chinese loan words on the results.Experimental results show that our proposed language identification system based on GMM-UBM model have better expansibility and applicability; The method of double threshold for voice activity detection can effectively remove noise and extract useful voice frames; Experiments used SDC feature parameters have better performance than those used MFCC feature parameters; the performance of the minority language identification would significantly decline when Chinese loan words exist.

Keywords/Search Tags:

Language Identification, Minority Language, Voice Activity Detection, GMM-UBM, Chinese Loan Words

PDF Full Text Request

Related items

1	Study Of Application Of A Language Model Combining Statistics And Rules In Chinese Input Method
2	Research On Automatic Language Identification And Its Application
3	Based On The Characteristics Of Cv Syllable Minority Language Recognition Research
4	Study On The Audience Demand Of Xinjiang Minority Language Radio
5	A Study On Chinese Ministry Script Publishing Since The Reform And Opening
6	Research On Minority Language Recognition WEKA Platform And Multi-classifier
7	Short-duration Language Identification Based On Uyghur-chinese Speech
8	Natural Language Processing-A Study Of Vectorization Of Chinese Words And Short Texts
9	Research On Border Minority Language Recognition Based On Multiple Classifier Algorithm
10	A Study On The Identification Of Isolated Words In Yi Language