Font Size: a A A

Research On The Classification Of Imbalanced Data Sets Based On R-SMOTE

Posted on:2016-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:M YuanFull Text:PDF
GTID:2308330479977713Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Imbalanced data set refers to the amount of a certain kind of sample is far less than other samples among the same data set, and it is pervasive in every subject area. Though minority class is at a disadvantage in terms of the amount, it has always been attracted a lot of attentions, e.g. issues of medical diagnosis. Since it is always surrounded by the majority class and has the disadvantage in amount, thus if using the traditional classification to classify the imbalanced data sets, the accuracy of classification of majority type is high, while the minority type is low. The classification ability of traditional technique to the imbalanced data sets is limited, and it cannot reach our expected classification effect.SMOTE is a kind of sampling methods which proposed by Chawla et al. It manually inserts the examples between the close minority class to add the examples for the minority to balance the data set. However, the new synthetic examples in the minority class in accordance with the SMOTE can only distribute in the line segment of original minority class, which strictly limit the distribution range of new examples of the minority class. In order to eliminate this limitation, this paper put forward a kind of new sampling method- R-SMOTE. It can generates synthetic examples in the n-dimension sphere space which consists of the minority-class examples and the closest minority-class examples, to improve the imbalanced degree of data set. Finally, it can construct the classification model of imbalanced data set based on classifier are trained with extreme learning machine from the obtained balanced data sets.From the experimental test of various actual data sets, we can find that R-SMOTE can effectively solve the problems what is difficult for imbalanced data set to classify the minority-class examples. In the comparison with SMOTE and random sampling method, we find that R-SMOTE possesses higher accuracy rate and stability in the process of minority-class classification.
Keywords/Search Tags:Imbalanced data sets, Classification, Over-sampling, Extreme learning machine
PDF Full Text Request
Related items