Font Size: a A A

Research On Minority Language Recognition WEKA Platform And Multi-classifier

Posted on:2014-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2268330425976306Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
There are many minorities in vast territory of China, Ethnic language becomes the most important, the most convenient and the most common form in exchanging information among these ethnic groups people. With the popularization and application of information technology, we put forward higher requirements about how to acquire、handle and process digital voice information. The purpose of Language Identification (Language Identification, LID) is that using computer to analyze speaker’s voice, and then judge what kind of language the voice belongs to.This thesis based on minority language Identification database and Weka platform, using a variety of multi-feature parameters and multiple classifiers to explore ways to improve the recognition rate. The main work of this thesis includes:1、Classifier selection experiment:Firstly, extracting SDC acoustic feature parameters and the tone frequency FO characteristic parameters from the speech waveform, and then write a program to convert the parameter to WEKA software supported formats, and finally using the NaiveBayes (NB), LISVM, MultilayerPercettron (BP), RBFNetwork (RBF), J48these five kinds of classifiers to train and test for language recognition, and the test results are given. Experimental results show that, algorithms of LISVM and RF have better performance.2、Research on the influence of kernel function to the language recognition result. Experimental results show that the recognition rate can reach98.8%when we apply model of nu-SVC in LISVM classifier by using acoustic parameters of SDC.3、Doing contrast experiment between random forest (RF) and other classifiers based on the characteristic parameters of fundamental frequency FO. Experimental results show that separately for male and female data sets the random forest classifier recognition rate can up to100%, but for the men and women mixed data sets, its recognition rate is relatively low.4、Discussed the influence of the number of training samples for language identification results on the basis of experiments. Whether C-SVC Model or nu-SVC model, the more training data, the recognition rate is higher, is conducive to the judgment, in which nu-SVC model has the best performance. Although the random forest manifests the highest recognition rate (100%) in a variety of classifiers, it will be decreased if any speech data into a training and a test file.
Keywords/Search Tags:Language Identification, Shift Differential Cepstrum, Classifier, Support Vector Machine, Random Forests
PDF Full Text Request
Related items