Font Size: a A A

The Application Of Improved AdaBoost Algorithm Based On Cost Sensitive In Imbalanced Data

Posted on:2019-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:W SunFull Text:PDF
GTID:2428330566493783Subject:statistics
Abstract/Summary:PDF Full Text Request
In the era of rapid data growth,data structures are presented in various forms,and data mining methods emerge in an endless stream.Commonly used classification algorithms have default datasets that are balanced and focus more on the classification accuracy.The emergence of more and more unbalanced datasets has created difficulties for mining and analysis.Unbalanced datasets pay more attention to the correct proportion of a few class samples.However,traditional classification algorithms have a poor recognition rate for a few classes.Therefore,this paper proposes an improved AdaBoost algorithm based on cost-sensitive thinking for dealing with unbalanced datasets.This paper focuses on the original AdaBoost algorithm,selects the logistic regression as the base classifier,introduces cost-sensitive factors,combines costsensitive main ideas,and converts the decision function that maximizes the logistic regression conditional probability to Bayesian.The risk is minimized to determine the value of the sensitivity factor;then the value is introduced in the part of the weight update,and the original algorithm treats the classification result as simply a correct classification and an error classification method to improve,and classifies it into a classification error.There are four categories: costly,misclassification and low cost,correct classification but large cost,correct classification and low cost.By improving the accuracy of minority classifications by assigning cost-sensitive weights to sample points.In this paper,the classification results of commonly used classification algorithms under different equilibrium degrees are compared by numerical simulation.Then,the validity of the improved data validation method for defaulted data sets is selected through empirical analysis.The results show that the proposed method has significantly improved the identification of a few classes.
Keywords/Search Tags:Imbalanced data, Cost-sensitive, AdaBoost, Classification
PDF Full Text Request
Related items