Font Size: a A A

Research On Directly Optimizing F-measure Algorithm Based On Cost-sensitive SVM

Posted on:2017-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2308330485964009Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of society and the progress of science, our lives have been gradually changed by the research of data mining and machine learning. Meanwhile, classification technology as an important part of those researches, have attracted many scholars focuses. Different algorithms have been proposed to construct different classifiers. For binary classification, when the sample distribution is imbalanced, people often prefer F-measure as the metric, which is defined as the harmonic mean of the precision and recall, and can evaluate the performance of a classifier more accurately. Due to its wide applications in unbalanced classification, How to design an effective F-measure based learner becomes a hotspot in recent years. Directly optimizing of this measure is often challenging as the resulting optimization problem is non-convex. Therefore, various approximation methods have been proposed, which mainly fall into two paradigms:the cost-sensitive approach and the direct optimization approach. Different from the existing work, in this dissertation, we take SVM as the tool and propose a novel method, which can combine two approaches above, and produce an effective and efficient classifier oriented to F-measure.The main research works in this dissertation can be summarized as followings:(1) First of all, we begin our work with the introduction of SVM for binary classification and the imbalanced metric F-measure. Then we analyze the current researches on the cost-sensitive approach and the direct optimization approach. Based on it, we propose to design a novel F-measure based learner by combining those two approaches together.(2) Secondly, we present an explicit transformation from the optimization of F-measure to cost-sensitive SVM. For the problem that the new objective function is non-smooth, and the traditional smooth optimization techniques cannot be applied. We propose to use the bundle method, which can solve the problem in O(1/ε) time, and does not depend on the number of training examples. Empirical evaluations on the imbalanced data sets demonstrate that the approach we proposed can create more accurate model than the existing F-measure based approaches.(3) Lastly, since the bundle method solves the inner optimization by converting it into the dual form, which can only guarantee the dual objective increase monotonically, and does not guarantee the primal objective decrease monotonically. It may slow down the practical convergence rate. To fill the gap, we present an additional line search algorithm which can alleviate the fluctuations during the iterations, and guarantees the primal function monotonically decrease. Empirical evaluations on the large-scale imbalanced data sets show that when compared with currently existing F-measure based classifiers, the learner we proposed is very efficient, and can greatly reduce the training time, while obtaining better (or comparable) accuracy of the model.
Keywords/Search Tags:F-measure, Support Vector Machine, Imbalanced Binary Classification, Cost-sensitive, Bundle Method, Line Search
PDF Full Text Request
Related items