An Imbalanced Data Classification Method Based On Active Learning

Posted on:2016-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y S Yao

Full Text:PDF

GTID:2308330461956521

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Classification is an important subject in Data Mining and Machine Learning.As the expansion of the application area,more kinds of data appears in research field and in them, imbalanced data is one kind.Standard classification methods emphasis on the entire accuracy. When dealing with imbalanced data, methods will sacrifice the minority,while for imbalanced data, the minority is more important. Research has shown that active learning with SVM works well on imbalanced data but with it costs high. It’s efficient to research how to reduce the cost and improve classifier.In this paper, A classification method SID-SVM based on active learning is proposed to classify imbalanced data sets. SID-SVM improves the classifier effectively with little iteration cost, lowers the data imbalance ratio and be robust to the imbalance ratio and the scale of data set. It Keeps high-performance and cuts cost.The main works are as follows:SID method is presented for choosing first-training-set:chose the samples that are nearest to the other class. SID method will reduce the iteration cost and data imbalance ratio. Test the method on linearly separable data set, and expand it to the linearly non-separable data set through SVM kernel functions.Propose DC method to chose the most informative samples during iteration:1. randomly chose one sample from the mis-classified samples(if exist) by the current class boundary.2. chose the nearest sample to the current boundary. Then add the chosen samples into the training set. DC method will sharply adjust the class boundary, and it works better on the imbalanced data set.Two layers of optimization are taken:firstly, keeping the performance of the classifier while cutting the calculation of SID with Random Algorithm. Then, computing in parallel on Hadoop. The optimization reduce the cost of pre-process further and make the method more practical.

Keywords/Search Tags:

active learning, imbalanced data, SVM, iteration cost

PDF Full Text Request

Related items

1	Research On Adaptive Imbalanced Data Classification
2	Study Of Active Learning Algorithms On Imbalanced Data Using Extreme Learning Machine
3	Learning in extreme conditions: Online and active learning with massive, imbalanced and noisy data
4	Research On Online Active Learning For Class-imbalanced Data Stream
5	Designing Feature Selection And Classincation Methods For Classificationmethods For Imbalanced Learning And Cost-sensitive Learning Problems
6	Algorithms And Applications Of Imbalanced Data Classification Based On Semisupervised Learning
7	Imbalanced Data Classification Based On Active Learning
8	Research On Imbalanced Data Classification Algorithms Based On Weight Analysis Of Loss Function
9	Research On Weighted Extreme Learning Machine Algorithm Based On Imbalanced Data Distribution
10	Research On Imbalanced Data Classification Methods Based On Ensemble Learning