Font Size: a A A

Research On Cost-Sensitive Machine Learning Based On Dynamic Cost

Posted on:2011-08-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L ChenFull Text:PDF
GTID:1118330362955307Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The traditional machine learning methods aimed to obtain the highest accuracy and the lowest error, which assumes that the costs of the misclassification are equal to each other. In many realistic applications, the misclassification cost is very different from the upper cituation, especially in disease diagnosis. Cost sensitive learning takes the asymmetric misclassification cost into consideration, aims to minimize the total classification cost, and focuses on the accuracy of the interested classes.The misclassification cost used in popular cost sensitive machine learning methods is a stationary cost, which usually produces a dependence on special datasets or real applied domains, leads to a lower performance of classifier when facing skewed datasets and a bad generalization over other application domains. To overcome the limitations of the stationary cost, a dynamic cost mode is proposed, which incorporates the expert's experience and knowledge, auomatically searches suitable misclassification costs for every sub datasets, and then forms a cost-sensitive classifier.In order to obtain a best compromise between the minority and the majority, an optimal misclassification cost function is rationally defined; by using the geometric average formula, a couple of metric indices, such as the geometric average about precision, recall, Kappa, and F-measure, are redefined, aimming to increase the accuracy of minority and decrease the loss of the classifier's whole performance as enough as possible. Through these indices, we can evaluate wether the classifier obtains the optimal compromise between Recall and Precision.Based on the proposed dynamic misclassification cost mechanism, three independent classifiers are proposed: (1) an adaptive dynamic cost optimization decision tree (ADODT), which uses gradient-asent algorithm to search the most suitable misclassification cost for every sub datasets, and conducts a cost-sensitive classifier based on decision tree; (2) a cost sensitive classifier based on standard genetic algorithm (CSC-SGA). The only difference between ADODT and CSC-SGA is that the CSC-SGA uses GA as its searching method, in which it uses the optimal misclassification cost fuction as its fitness function; and (3) an adaptive dynamic cost-sensitive support vector machine (ADC-SVM), which uses GA as the optimal misclassification cost searching mothed, and uses the SVM as an individual to conduct a classifier. The formed classifer shows an excellent performance when facing the skewed datasets.Incorporating bagging and GA, a new cost sensitive composite classification algorithm is proposed, named adaptive dynamic-cost optimization ensemble (ADOE). ADOE uses the bagging method to form some sub datasets, and used GA as its optimal misclassification cost finding mothed. ADOE trains some cost sensitive component classifiers to relabel every instance in the original dataset by voting, and builds a cost insensitive classifier upon the relabeling dataset. Surprisingly, ADOE is very suitable for extremely imbalanced datasets with a high stability.Trough a lot of experiments, the classifier conducting algorithms we proposed dominate other comparative similar classification algorithms. Especially, as a matter of the fact resulting from those experiments, ADOE is very efficient, powerful, and stable when facing class distribution extremely skewed datasets.
Keywords/Search Tags:Cost-sensitive machine learning, Dynamic misclassification cost, Genetic algorithm, Support Vector Machine, Ensembale learning
PDF Full Text Request
Related items