A Research On Imbalanced Learning Based On Semi-supervised SVM

Posted on:2015-07-26

Degree:Master

Type:Thesis

Country:China

Candidate:W Cheng

Full Text:PDF

GTID:2308330464466769

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the network system, information security and other fields, Large-scale data present explosively rapid growth. Although the existing machine learning methods have been widely used, but the analysis of and learning from imbalanced data is still one of the challenges at present. The imbalanced learning aims to improve the performance of the algorithm on the imbalanced data classification. Due to complex distribution characteristics of imbalanced data sets, so we need to introduce new principle, classification algorithm and classification tools to solve the problem of learning from imbalanced data. In the case of lack of labeled samples, semi-supervised learning attempt to improve the performance of the algorithm by introducing the unlabeled samples. So it also is one of the hot directions of current research. This paper focused on the analysis of the imbalanced data classification algorithms, and purposed new strategies by combining with semi-supervised SVM algorithm.1. This chapter focuses on the analysis of significant drop performance problems of the SVM algorithm during the process of training classifier on the highly imbalanced data sets. A novel imbalanced learning method is proposed by combining with repetitive undersampling algorithm and Granular SVM. Selecting G-means, which is one of the most popular evaluation criterions of the imbalanced learning to evaluate the trained models, and using the most optimal model to classify test data. The effectiveness is proved on different imbalanced data sets.2. In the case of lack of labeled samples, semi-supervised SVM algorithm is not suitable for the imbalanced data classification. The proposed ensemble granular semi-supervised SVM construct different classifiers by introducing the â€œinformation granuleâ€ strategy. To solve this problem, the evaluation index of clustering is also been introduced to determine the confidences of unlabeled samples. The algorithm can work very well on solving the problems mentioned above effectively.3. Through the comparison of basic sampling methods for imbalanced learning, we proposed three new imbalanced learning algorithms based on the different sampling methods. Focusing on distribution characteristics of imbalanced data sets, we make acontribution to rebalancing the data sets by using three popular sampling methods. Experiments on different imbalanced data sets show good performance of the three algorithms.Our work is supported by national natural science foundation of China(No. 61173092), new century excellent talents to support plan(No. 66ZY110) and shaanxi province science of technology research and development projects(No. 2013KJXX-64).

Keywords/Search Tags:

Imbalanced data classification, Semi-supervised SVM, Granular S3VM, Ensemble learning

PDF Full Text Request

Related items

1	Selection And Classification Of Unbalanced Data Based On Semi - Supervised And Integrated Learning
2	Research On Sentiment Classification Based-upon Imbalanced Data
3	Research And Implementation Of Semi-supervised Machine Learning Algorithms For Classifying The Imbalanced Protocol Flows
4	Research And Application Of Image Classification Algorithm Based On Semi-supervised Learning
5	Imbalanced Data Classification Based On Multi-classifier Ensemble And Semi-supervised Learning
6	Research On Imbalanced Dataset Classification In Semi-supervised Learning
7	Research On Ensemble And Imbalanced Based Supervised/Unsupervised Learning Methods And Application
8	Ensemble Based Semi-supervised Learning For Fault Classification
9	Feature Selection And Semi-supervised Classification For Imbalanced Data
10	Research On Image Classification Algorithm Based On Semi-supervised Learning