Font Size: a A A

The W-SVM Model For Imbalanced Datasets

Posted on:2013-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:C X DiaoFull Text:PDF
GTID:2248330377960529Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
Support vector machine (SVM) was proposed as a new machine learningalgorithm by Vapnik in1995. Since then, people have paid close attention to SVM.Implementing classification by SVM is one of the most important targets of datamining. When the sample is balanced, SVM has remarkable good classificationperformance. In fact, there are more imbalanced datasets from practical problems.In order to adapt SVM to imbalanced sample learning, it need transform sample,improve algorithm, convert problem or adopt other methods.This paper is based on establishing financial early-warning model for SMEboard. As a result of the extraordinary imbalances between the amounts of ventureand normal enterprises, SVM shows remarkable poor performance in distinguishingthe venture enterprises. To resolve such binary-class classification based onimbalanced datasets, this paper focuses on working out a new improved algorithm.The major study includes three parts. Firstly, equate the normal vector ofhyperplane with the objective weights of each attributes. Set the constraints ofobjective weights with subjective weights’ value range to form an inequality relatedwith the variable of normal vector. Consider the inequality as one of the constrainsin quadratic programming formulation of SVM, then deduct the standard form ofquadratic programming for the dual problem, it’s called W-SVM. The experimentalresults demonstrate that W-SVM is more effective than SVM in class imbalancedlearning. Secondly, combine W-SVM and one existing method which is calledC-SVM into CW-SVM. The according experiments have shown that the optimizedsolution set has more reference value than the single optimized solution foundthrough C-SVM. Thirdly, for the purpose of making W-SVM be able to recognizemore rare cases and reduce the workload while searching the optimized solution set,it proposes a hypothesis of finding the optimized solution in one particulardirection. The hypothesis could be more easily understood than the integration ideaof subjective and objective weights. Through analyzing the following experimentalresults, it finds that the method for CW-SVM searching the optimized solution--theabove hypothesis is reasonable and feasible.
Keywords/Search Tags:Imbalanced dataset, support vector machine, W-SVM, CW-SVM, raresample, financial early-warning
PDF Full Text Request
Related items