Font Size: a A A

Research On Border Minority Language Recognition Based On Multiple Classifier Algorithm

Posted on:2016-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:R G XiaoFull Text:PDF
GTID:2278330482470523Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The land borderline of China is about 22000km. Among the countries adjencent to China, most have natural barriers. Countries are linked by mountains and rivers, and border inhabitants exchanges frequently Border areas are mainly the gathering area of ethinic minorities. Having language barriers have caused many problems to the border defense soldiers during their everyday law enforcement and public security work, which would even cause contradictions between the soldiers and local people. With the rapid development of information technology, Language Identification get new advance. To auto identify languages from different nationalities.is not a dream to us now. This paper is based on multi-national language identification databases and WEKA platform. And the research is conducted trough multi-feature parameters(SDC,FO) and various classifiers, which will help with the improvement of language identification rate.1. The collection of minority languages:to collect the minority languages, and screen them according to the sample recording quality. To classify the samples by its language, and the length of the sentences. Make sure the quality of the samples.2. The selection of classifier:first, to extract the SDC acoustic feature parameters and fundamental tone frequency F0 feature parameters from the speech waveform. Change all the extracted parameters into ARFF format by program. To conduct language identification training and testing of the extracted feature parameters by using the six classifiers LibSVM. NaiveBayes (NB)、RBFNetwork (RBF)、MultilayerPercettron (BP)、J48、Random Forests (RF) and analyzing the test results. From the experimental results, we can know that LibSVM and RF have the best performance.3. To conduct research and analysis of the SVM kernel function so to estimate the effect it has on language identification results. Based on the experimental results, the language identification rate can reach high value by the usage of SDC acoustic feature parameters and nu-SVC model of LibSVM classifier.4. Based on the fundamental tone FO feature parameters and compare the RF with the other classifiers, the results show that the language identification rate of RF classifiers experiment sorted by gender respectively can reach 100%, while the mixture shows a relatively low rate.5. The experiments can analyze the effect that the training sample size have on language identification results. No matter C-SVC model and nu-SVC, identification rate is in direct proportion to training size, which means that the more the training databases, the better for the judgement. nu-SVC has the most stabilized identification results. Despite that the random forest identification rate can reach 100%, when there is an insufficiency of training databases, the rate will decrease.
Keywords/Search Tags:Language identification, SDC Acoustic characteristic parameters, Nultiple classifier, Recognition rate
PDF Full Text Request
Related items