Research Of Boosting Classificaion Algorithm For Imbalanced Data

Posted on:2014-12-18

Degree:Master

Type:Thesis

Country:China

Candidate:L L Wang

Full Text:PDF

GTID:2298330422990430

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Many real world classification applications involve learning from imbalanceddata sets. In general, the imbalanced data sets are predominately composed of“normal” examples which called negative samples, while only a small percentage of“abnormal” or “interesting” examples, called positive samples. Usually, we caremore about positive samples with a few examples, and positive samples tend to havehigh loss due to its misclassification. But traditional classification algorithms aresupposed to maxmize overall accuracy of the whole data sets, therefore, theclaasification result may benefit negative samples with large numbers, and theperformance of positive samples might be very pool.At present, the main work for imbalanced data classification problem isfocused on the resampling techniques of the data level and the algorithms’improvement. SMOTE algorithm is one of the most typical resampling methods.And it’s a kind of over-sampling method, which creats synthetic examples from therare or minority clss in character space to make the data set into balance. WhileBoosting, as a ensemble mothod, was proposed level to deal with imbalanced dataproblem on algorithm, it cares more about the "difficult" samples, and thencombines multiple weak classifiers into a strong classifier.However, SMOTE algorithm did not consider the difference of the contributionto the minority class classification performance from different areas of the minorityclass samples, and Boosting algorithm makes those samples from the majority classand the minority class that is hard to be correctly classified be treated equally, these,to some extent, hindered the improvement of minority class classificationperformancet. This paper wil propose a imbalanced data classification algorithm,DSMOTE-Boost, which combines a region discriminating over-sampling methodcalled DSMOTE with Boosting algorithm. This algorithm will devide the minorityclass examples into three groups, including boundary examples, security examplesand isolated points, in which take different over-sampling strategies respectively.The algorithm increases emphasis on boundary examples, and affirms the value ofisolated points by setting a threshold of the imbalanced ratio; Moreover, this paperproposes a method of adaptive adjusting over-sampling rate of boundary examplesto avoid the blind over-sampling. This algorithm was tested on several UCI data setsand the results show that DSMOTE-Boost algorithm is effective by getting a betterperformance of the minority class classification.

Keywords/Search Tags:

imbalanced data, resampling, ensemble learning, smote, boosting

PDF Full Text Request

Related items

1	Research On Imbalanced Data Processing Methods For Industrial Big Data
2	Research And Application Of Ensemble Learning Based On Combined Resampling Methods
3	Research On The Application Of Boosting Algorithm Based On Improved SMOTE In Personal Credit Evaluation
4	Research And Application Of Imbalanced Data Classification Algorithms Based On Ensemble Learning
5	Unbalanced Data Classification Based On Resampling And Hybrid Ensemble
6	Research On Imbalanced Data Classification Algorithms Based On Ensemble Learning
7	Research On The Imbalanced Data Learning
8	Research On Imbalanced Data Classification Based On Sampling Method And Ensemble Learning
9	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
10	Classification In Imbalanced Data Based On Over-Sampling And Ensemble Learning