Research On Minority Language Recognition WEKA Platform And Multi-classifier

Posted on:2014-09-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2268330425976306

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

There are many minorities in vast territory of China, Ethnic language becomes the most important, the most convenient and the most common form in exchanging information among these ethnic groups people. With the popularization and application of information technology, we put forward higher requirements about how to acquire、handle and process digital voice information. The purpose of Language Identification (Language Identification, LID) is that using computer to analyze speaker’s voice, and then judge what kind of language the voice belongs to.This thesis based on minority language Identification database and Weka platform, using a variety of multi-feature parameters and multiple classifiers to explore ways to improve the recognition rate. The main work of this thesis includes:1、Classifier selection experiment:Firstly, extracting SDC acoustic feature parameters and the tone frequency FO characteristic parameters from the speech waveform, and then write a program to convert the parameter to WEKA software supported formats, and finally using the NaiveBayes (NB), LISVM, MultilayerPercettron (BP), RBFNetwork (RBF), J48these five kinds of classifiers to train and test for language recognition, and the test results are given. Experimental results show that, algorithms of LISVM and RF have better performance.2、Research on the influence of kernel function to the language recognition result. Experimental results show that the recognition rate can reach98.8%when we apply model of nu-SVC in LISVM classifier by using acoustic parameters of SDC.3、Doing contrast experiment between random forest (RF) and other classifiers based on the characteristic parameters of fundamental frequency FO. Experimental results show that separately for male and female data sets the random forest classifier recognition rate can up to100%, but for the men and women mixed data sets, its recognition rate is relatively low.4、Discussed the influence of the number of training samples for language identification results on the basis of experiments. Whether C-SVC Model or nu-SVC model, the more training data, the recognition rate is higher, is conducive to the judgment, in which nu-SVC model has the best performance. Although the random forest manifests the highest recognition rate (100%) in a variety of classifiers, it will be decreased if any speech data into a training and a test file.

Keywords/Search Tags:

Language Identification, Shift Differential Cepstrum, Classifier, Support Vector Machine, Random Forests

PDF Full Text Request

Related items

1	Random Forests Expression Recognition Algorithm Based On Sequence Features
2	Research On Korean Spoken Language Identification
3	Support Vector Machine Based Language Recognition
4	Research On Node Localization Of Underwater Wireless Sensor Network Based On Machine Learning
5	National Language Support Vector Machine-based Language Identification
6	Design And Implementation Of Language Identification System For Web Video
7	Research On Classification Method Of Random Support Vector Machine And Its Application
8	Study And Application On Support Vector Machine Classification
9	Research On Identification And Control Of Nonlinear System Based On Support Vector Machine
10	Studies On Question Classification Technology In Chinese Question Answering System