Font Size: a A A

Minority Accuracy Improvement Using Cost-sensitive Localized Generalization Error Model

Posted on:2020-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiuFull Text:PDF
GTID:2428330590460634Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Datasets imbalance can be categorized into between-class imbalance and within-class imbalance.Between-class imbalance means that the number of samples in different class has extremely large differences.For within-class imbalance,samples in a class form several small clusters with extremely large differences in the number of samples per cluster.Most traditional machine learning methods treat samples from different classes equally and assume that the number of samples in different classes are similar.Hence,the essential problem of imbalanced pattern classification is that the imbalance of datasets often significantly influences the performance of traditional machine learning methods.When dealing with imbalanced pattern classification problems,especially those with the number of samples is small,between-class imbalance often followed by within-class imbalance.In this case,it is difficult for traditional machine learning methods to generalize unknown samples well.In two-class classification problems,this kind of methods tends to classify minority samples into majority class.This thesis proposes a new neural network training method via a minimization of the costsensitive localized generalization error-based objective function(c-LGEM)to achieve a better generalization capability of the classifier and make use of the high efficiency of cost-sensitive methods.Moreover,the c-LGEM emphasizes the minimization of the generalization error of the minority class in a cost-sensitive manner while only minimize the training error of majority samples to improve the accuracy of minority samples classification.Moreover,a k-NN model has been added into the c-LGEM to locate the samples near the decision boundary and determine the number of generated unknown samples based on the number of majority samples in k nearest samples.Experiment on 10 UCI datasets and 5 methods(including proposed method)shows that c-LGEM yield good G-mean and AUC.Moreover,experiments also shows the true positive rate(TPR)to demonstrate the advantage of c-LGEM on improving minority accuracy and gives a further discussion on c-LGEM.
Keywords/Search Tags:cost-sensitive, multilayer perceptron neural network (MLPNN), Localized generalization error model(L-GEM)
PDF Full Text Request
Related items