Font Size: a A A

Research On Direct Optimization Of The Imbalanced Metric Algorithms Based On Structural SVM

Posted on:2017-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:K YangFull Text:PDF
GTID:2308330485464023Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the fast development of modern computer technology, scientific and social fields have accumulated a large amount of data, which provided important support for the various applications. Meanwhile, how to find more useful information from these data has become an urgent requirement for the people. The emergence of data mining and machine learning meets this demand well. Classification, as a basic method of data mining, has caught the attentions of many researchers. Moreover, binary classification has become the emphases for the research for its broad applications. However, in the real-world situations, many binary classifications are not balanced. Thus, the existing binary classification algorithms are difficult to be used directly. Therefore, in recent years, many new binary classification algorithms have been proposed for unbalanced data sets, the basic ideas can be mainly categorized into two groups:the data-oriented methods and the algorithm-oriented methods. Since the latter does not need data preprocessing procedure, and become the focus of current research. On the basis of these researches, in this paper, we adopt the SVM as a tool, and focuses on the unbalanced binary classification algorithms based on structural SVM. First of all, we introduce the binary classification algorithms based on SVM and the background of unbalanced applications. Then, we analyze the current research on the unbalanced algorithms and propose to design the improved SVM algorithm for unbalanced situation by direct optimization of the unbalanced measure. Different from the existing improved SVM algorithms, in this paper, we present to regard the unbalanced binary classification problem as a process of learning classification list and propose to solve it by adopting the structural SVM.The main contributions of this paper are:(1) For the unbalanced measures such as AM and QM, we propose to use n-slack structural SVM as the framework and define the objective functions oriented to AM and QM. For the problem that the new functions are non-smooth and difficult to be optimized, we introduce to use Cutting Plane algorithm, which can not only solve the problem but also make the iteration times are o(1/ε2), where ε is permissible error. For the internal computational challenge of the AM and the QM loss that find the most violated constraint sub-procedures. Two efficient algorithms are designed respectively, both their time complexity are only o(mlogm) Experiments on imbalanced data sets show that the algorithms we proposed are not only better than the traditional SVM algorithms but also more effective than other improved data-oriented algorithms.(2) For anther evaluation criterion GTP/PR, this paper proposes a direct optimization algorithm based on 1-slack structural SVM. The proposed algorithm firstly defines the objective function oriented to GTP/PR which is tighter than the one based on F1. For the problem that new function is non-smooth and difficult to optimize. Cutting Plane algorithm based on 1-slack SVM was proposed to solve this problem, which makes the number of iterations only o(1/ε) and more suitable for the large-scale applications. Experiments on big imbalanced data sets show that when compared with other imbalanced SVM algorithms, the algorithm we proposed is both effective and efficient.
Keywords/Search Tags:Structural support vector machine, Imbalanced data set, Data-oriented, Algorithms-oriented, AM, QM, GTP/PR
PDF Full Text Request
Related items