Font Size: a A A

The Cost-sensitive Support Vector Machine Supervised Learning

Posted on:2008-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:G LiFull Text:PDF
GTID:2208360215953953Subject:Education Technology
Abstract/Summary:PDF Full Text Request
The problem of misclassification cost in supervised learning appears in the process of machine learning technology applied to the reality in recent years. In order to solve real-world problems in a better way, cost-sensitive learning has become one hot topic of current international machine learning community. Support Vector Machines(SVM) is an excellent machine learning algorithm based on statistical learning theory. However, SVM is not cost sensitive, like decision tree, artificial neural network(ANN) and other traditional algorithms. So, how to design cost sensitive SVM and to improve its performance become more import.This thesis studies the implementation of cost sensitive SVM and the methods to improve its performance. In detail, the main works are as following:1. Discuss statistical learning theory and find out the reason that SVM is more effective than other algorithms. Then, a training algorithm of standard SVM-SMO algorithm is implemented.2.Three cost sensitive SVM algorithms based on random over-sampling, SMOTE, and under-sampling are designed by reconstructing the space distribution of classes. Moreover, combination methods of above techniques i.e. hard-ensemble and soft-ensemble, are also designed. In our soft-ensemble, ensemble method is different from the method of ANNs. Experimental results on data sets suggest that cost sensitive SVM based on under-sampling is a good method on the whole, but on the seriously imbalanced data sets, this method is ineffective while soft-ensemble is more effective. Moreover, some more detail results is achieved. These results provide foundation for using this kind of cost sensitive SVM.3. From theoretical and empirical perspectives, one cost learning algorithm-CSSVM algorithm, proposed by Lin etc., is studied. Experimental results show that although CSSVM could reduce the whole misclassification cost, the parameters in the model influence learning performance of CSSVM. In order to select the best parameters automatically, this paper designs and implements the parameter selection based on genetic algorithm (GA) by combining GA and cost sensitive SVM.
Keywords/Search Tags:Support Vector Machine, supervised learning, cost-sensitive learning, sampling, parameter selection, genetic algorithm
PDF Full Text Request
Related items