Font Size: a A A

D-MetaCost:An Efficient Multi-class Cost-sensitive Algorithm

Posted on:2018-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:S J DengFull Text:PDF
GTID:2428330512494295Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classification problem has always been an important field of machine learning and data mining.Traditional classification algorithms are designed to improve the accuracy of classification,and minimize the classification error,which is based on all the classes of equal misclassification cost.However,when the misclassification cost is not equal,the cost-sensitive classification is very important.There are many cost-sensitive learning algorithms,such as C4.5cs,which is based on the C4.5 algorithm,a method of converting a general classification model into a cost-sensitive classification model using a meta-cost approach,an ensemble learning algorithm to solve the cost-sensitive classification by adjusting the initial distribution of the samples,and the decision classification method based on minimum cost.MetaCost learning method is a typical cost-sensitive algorithm.It is a method of converting the traditional classification algorithm into the cost-sensitive classification algorithm proposed by Domingos in 1999.First,It is based on a "meta learning"process,random resampling on the original training set,and trained to learn a lot of member classifiers.Calculate the classification probability of all instances.Then modify the class label for each training instance according to the minimum classification cost,and get a new training set.Then train the new training set to get the final cost-sensitive classification model.However,if the original training set is unbalanced,the random resampling will get an imbalanced training subset and the performance of the classifier may not be good.In addition,the final classifier of MetaCost is a single model,which only considered the new training set,so the results of classification prediction may not be optimal.In this paper,based on the two shortcomings,the algorithm is improved.In this paper,a new cost-sensitive classification algorithm,D-MetaCost algorithm,is proposed to optimize the MetaCost algorithm.In sampling phase of MetaCost algorithm,in order to deal with the problem of imbalanced data sets,we can use the method of training set partition to resample.So the training subsets are balanced.Specific operations are as follows:divided the majority class samples into several disjoint subsets,then combined each subset with rare samples to constitute different training sets.This sampling method made the base classifiers more representative.For the final classification model,it can use ensemble learning theory to combine multiple classifiers.This can significantly improve the final classification performance.Specific operations are as follows:we can calculate the classification accuracy of each member classifier,and then integrate the higher accuracy classifiers with the new classifier to get the final integrated classification model.Thus,the accuracy and cost of classification will be improved obviously for the final classifier.This paper discusses the two aspects of theory and experiment.Through a large number of experiments,it is concluded that in the 1000 experiment,D-MetaCost can produce better classification accuracy and cost than MetaCost and AdaBoost in most cases.The prediction performance is obviously improved.
Keywords/Search Tags:Cost-sensitive, Resampling, Ensemble Learning, D-MetaCost
PDF Full Text Request
Related items