Font Size: a A A

A Research On Imbalanced Learning Based On Semi-supervised SVM

Posted on:2015-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:W ChengFull Text:PDF
GTID:2308330464466769Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the network system, information security and other fields, Large-scale data present explosively rapid growth. Although the existing machine learning methods have been widely used, but the analysis of and learning from imbalanced data is still one of the challenges at present. The imbalanced learning aims to improve the performance of the algorithm on the imbalanced data classification. Due to complex distribution characteristics of imbalanced data sets, so we need to introduce new principle, classification algorithm and classification tools to solve the problem of learning from imbalanced data. In the case of lack of labeled samples, semi-supervised learning attempt to improve the performance of the algorithm by introducing the unlabeled samples. So it also is one of the hot directions of current research. This paper focused on the analysis of the imbalanced data classification algorithms, and purposed new strategies by combining with semi-supervised SVM algorithm.1. This chapter focuses on the analysis of significant drop performance problems of the SVM algorithm during the process of training classifier on the highly imbalanced data sets. A novel imbalanced learning method is proposed by combining with repetitive undersampling algorithm and Granular SVM. Selecting G-means, which is one of the most popular evaluation criterions of the imbalanced learning to evaluate the trained models, and using the most optimal model to classify test data. The effectiveness is proved on different imbalanced data sets.2. In the case of lack of labeled samples, semi-supervised SVM algorithm is not suitable for the imbalanced data classification. The proposed ensemble granular semi-supervised SVM construct different classifiers by introducing the “information granule” strategy. To solve this problem, the evaluation index of clustering is also been introduced to determine the confidences of unlabeled samples. The algorithm can work very well on solving the problems mentioned above effectively.3. Through the comparison of basic sampling methods for imbalanced learning, we proposed three new imbalanced learning algorithms based on the different sampling methods. Focusing on distribution characteristics of imbalanced data sets, we make acontribution to rebalancing the data sets by using three popular sampling methods. Experiments on different imbalanced data sets show good performance of the three algorithms.Our work is supported by national natural science foundation of China(No. 61173092), new century excellent talents to support plan(No. 66ZY110) and shaanxi province science of technology research and development projects(No. 2013KJXX-64).
Keywords/Search Tags:Imbalanced data classification, Semi-supervised SVM, Granular S3VM, Ensemble learning
PDF Full Text Request
Related items