Font Size: a A A

Comparison And Improvement Of Classifiers Performance For Unbalanced Data

Posted on:2018-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:W L YuFull Text:PDF
GTID:2348330512477221Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Class-disparity data exists widely in the real world.In some fields,it is more important to classify samples for minority classes than majority classes.However,most classical classification algorithms assume that the prior probability distribution of the samples is balanced or the costs of the misclassification are equal.Classifying unbalanced distribution of datas,the information of minority class samples is often masked by that of majority class samples,resulting in the classification error rate from minority class samples is much higher than majority class samples'.Therefore,the study of unbalanced data classification has been paid attention by more and more researchers.Due to the serious inclination or uneven distribution in the number of samples in the unbalanced data set,the traditional classification algorithm can not directly deal with the unbalanced data set,which will result in poor classification accuracy.Therefore,using the mixed sampling method to change the class distribution at the data level and propose an improved hybrid algorithm based on hybrid genetic algorithm to improve the classification performance,not only can improve the classification performance,but also make a small number of categories Accuracy is improved.The main research work and achievements include:(1)Select the base classifier.On the WEKA platform,Compare and analyze the classification performance and stability of C4.5 decision tree,BP neural network,Naive Bayes and SVM in the balanced data sets and the unbalanced data sets.(2)The impact of selective integration on balanced and unbalanced data sets.Comparing the classification accuracy of single classifiers and integrated classifiers on all data sets to search for the combination of base classifiers with large lifting space in the integrated learning with the help of WEKA.By comparing big differences of classification performance on unbalanced data sets selective integration and non-selective integration to verify the feasibility of selective integration.It is proved that the unbalanced data sets need to be changed at the data level according to the difference of integration classification performances on the balanced and unbalanced data sets.(3)This paper proposes a new method of comprehensive integration based on unbalanced data classification.For the distribution characteristics of the unbalanced data,the relative balanced training sets are constructed by combining the up-sampling SMOTE and the down-sampling Bootstrap.Then the C4.5 decision tree-based classifiers are selected by the hybrid genetic algorithm for ensemble learning,so as to improve the classification effect of a few classes in the unbalanced data set.
Keywords/Search Tags:Unbalanced Data Sets, Base Classifiers, Mixed sampling, Selective Ensemble, Hybrid Genetic Algorithm
PDF Full Text Request
Related items