Font Size: a A A

Research Of Boosting Classificaion Algorithm For Imbalanced Data

Posted on:2014-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2298330422990430Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Many real world classification applications involve learning from imbalanceddata sets. In general, the imbalanced data sets are predominately composed of“normal” examples which called negative samples, while only a small percentage of“abnormal” or “interesting” examples, called positive samples. Usually, we caremore about positive samples with a few examples, and positive samples tend to havehigh loss due to its misclassification. But traditional classification algorithms aresupposed to maxmize overall accuracy of the whole data sets, therefore, theclaasification result may benefit negative samples with large numbers, and theperformance of positive samples might be very pool.At present, the main work for imbalanced data classification problem isfocused on the resampling techniques of the data level and the algorithms’improvement. SMOTE algorithm is one of the most typical resampling methods.And it’s a kind of over-sampling method, which creats synthetic examples from therare or minority clss in character space to make the data set into balance. WhileBoosting, as a ensemble mothod, was proposed level to deal with imbalanced dataproblem on algorithm, it cares more about the "difficult" samples, and thencombines multiple weak classifiers into a strong classifier.However, SMOTE algorithm did not consider the difference of the contributionto the minority class classification performance from different areas of the minorityclass samples, and Boosting algorithm makes those samples from the majority classand the minority class that is hard to be correctly classified be treated equally, these,to some extent, hindered the improvement of minority class classificationperformancet. This paper wil propose a imbalanced data classification algorithm,DSMOTE-Boost, which combines a region discriminating over-sampling methodcalled DSMOTE with Boosting algorithm. This algorithm will devide the minorityclass examples into three groups, including boundary examples, security examplesand isolated points, in which take different over-sampling strategies respectively.The algorithm increases emphasis on boundary examples, and affirms the value ofisolated points by setting a threshold of the imbalanced ratio; Moreover, this paperproposes a method of adaptive adjusting over-sampling rate of boundary examplesto avoid the blind over-sampling. This algorithm was tested on several UCI data setsand the results show that DSMOTE-Boost algorithm is effective by getting a betterperformance of the minority class classification.
Keywords/Search Tags:imbalanced data, resampling, ensemble learning, smote, boosting
PDF Full Text Request
Related items