Research On Directly Optimizing F-measure Algorithm Based On Cost-sensitive SVM

Posted on:2017-02-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhou

Full Text:PDF

GTID:2308330485964009

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of society and the progress of science, our lives have been gradually changed by the research of data mining and machine learning. Meanwhile, classification technology as an important part of those researches, have attracted many scholars focuses. Different algorithms have been proposed to construct different classifiers. For binary classification, when the sample distribution is imbalanced, people often prefer F-measure as the metric, which is defined as the harmonic mean of the precision and recall, and can evaluate the performance of a classifier more accurately. Due to its wide applications in unbalanced classification, How to design an effective F-measure based learner becomes a hotspot in recent years. Directly optimizing of this measure is often challenging as the resulting optimization problem is non-convex. Therefore, various approximation methods have been proposed, which mainly fall into two paradigms:the cost-sensitive approach and the direct optimization approach. Different from the existing work, in this dissertation, we take SVM as the tool and propose a novel method, which can combine two approaches above, and produce an effective and efficient classifier oriented to F-measure.The main research works in this dissertation can be summarized as followings:(1) First of all, we begin our work with the introduction of SVM for binary classification and the imbalanced metric F-measure. Then we analyze the current researches on the cost-sensitive approach and the direct optimization approach. Based on it, we propose to design a novel F-measure based learner by combining those two approaches together.(2) Secondly, we present an explicit transformation from the optimization of F-measure to cost-sensitive SVM. For the problem that the new objective function is non-smooth, and the traditional smooth optimization techniques cannot be applied. We propose to use the bundle method, which can solve the problem in O(1/ε) time, and does not depend on the number of training examples. Empirical evaluations on the imbalanced data sets demonstrate that the approach we proposed can create more accurate model than the existing F-measure based approaches.(3) Lastly, since the bundle method solves the inner optimization by converting it into the dual form, which can only guarantee the dual objective increase monotonically, and does not guarantee the primal objective decrease monotonically. It may slow down the practical convergence rate. To fill the gap, we present an additional line search algorithm which can alleviate the fluctuations during the iterations, and guarantees the primal function monotonically decrease. Empirical evaluations on the large-scale imbalanced data sets show that when compared with currently existing F-measure based classifiers, the learner we proposed is very efficient, and can greatly reduce the training time, while obtaining better (or comparable) accuracy of the model.

Keywords/Search Tags:

F-measure, Support Vector Machine, Imbalanced Binary Classification, Cost-sensitive, Bundle Method, Line Search

PDF Full Text Request

Related items

1	Support Vector Machine Based Classification Algorithms Research For Imbalanced Data
2	Research On Ensemble Method Of Structured Support Vector Machine For Imbalanced Data
3	Investigation Of Parameters Optimization And Solution Method For Cost-sensitive Support Vector Machine And Its Application
4	Classification Methods For Class-imbalanced Datasets Of Unequal Misclassification Costs And Their Applications
5	Designing Feature Selection And Classincation Methods For Classificationmethods For Imbalanced Learning And Cost-sensitive Learning Problems
6	Cost-sensitive Hierarchical Classification Method For Imbalanced Data
7	Research On Support Vector Machine Classification Method For Imbalanced Datasets
8	Research On Classification Algorithm For Imbalanced Data Sets Based On Support Vector Machines
9	Research On Imbalanced Data Classification Methods Based On Ensemble Learning
10	Research On Fast Algorithms For Cost Sensitive Support Vector Machine