Classification Of Imbalanced Sample Based On Stream Data

Posted on:2015-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:D Zhao

Full Text:PDF

GTID:2308330479489719

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology, there are more and more data forms now and streaming data is one of them. The streaming data form is different from traditional data form in the features of mass, read-time and dynamic change. In addition, the data is always imbalanced in the real applications such as judging financial fraud from credit card transaction records, predicting disease from medical check-up data and so on.In solving the imbalanced data, the main idea of algorithm SMOTE is to increase minority class samples by finding nearest minority class point to linear interpolate to generate new samples. Algorithm REA take a method called sliding window to train classifier in a period of time to solve the problem of classifying imbalanced sample based on stream data and finally get a classifier. The two algorithms before have advantages and disadvantages in each one. Algorithm AMOTE doesn’t consider the distribution of minority class samples in different areas and it cannot control the position of new samples generated. Besides, algorithm REA doesn’t solve concept drift and small disjuncts in an effective way.This thsis proposes an improved algorithm called CSMOTE_REA based on traditional REA and SMOTE algorithms in order to solve the problem of classifying imbalanced sample based on stream data. This algorithm uses a sampling method with clustering feature. The method first adds historical data to add minority class point count and then clusters minority class point to recognize them in different areas. At the same time, the thsis proposes a method which generates samples based on grid to lead the generated data has strong relationship with minority before and improves the degree of polymerization in minority. Besides, the paper also proposes a method for the test samples to choose their classifications by themselves, which improve the capability of classifier to predict. Through experiments comparing to other algorithms on many data sets, the algorithm shows a better performance on the problem of classifying imbalanced sample based on stream data.

Keywords/Search Tags:

streaming data, imbalance data, resampling, ensemble learning

PDF Full Text Request

Related items

1	Research And Application Of Ensemble Learning Based On Combined Resampling Methods
2	Research On Ensemble Learning Approaches To Imbalanced Data Sets
3	Research On Imbalanced Data Classification Methods Based On Resampling And Ensemble Learning
4	Research And Application Of Imbalance Data Classification Based On SVM
5	Research Of Ensemble Classification Methods For Class-imbalance And Cost-sensitive Datasets
6	Studying Class Imbalance Characteristics And Classification Methods On Internet Traffic Flows
7	Imbalance Malicious Text Detection Based On Ensemble Learning
8	Online Learning Algorithms For Classification Of Streaming Data
9	Hybrid Ensemble Learning For Imbalanced Data
10	Classification In Imbalanced Data Based On Over-Sampling And Ensemble Learning