Font Size: a A A

Decision Tree On Imbalanced Data Sets

Posted on:2018-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:J J QinFull Text:PDF
GTID:2370330590977831Subject:Statistics
Abstract/Summary:PDF Full Text Request
In classification problem,many classifiers suffer from imbalanced data set.This situation is significant since it is present in many real-world issues.Among the improved classifiers aiming at dealing with the problem,combining data-level processing with the decision tree boosting becomes one of the improtant methods.This paper solves three problems about decision tree on imbalanced data set.Firstly,different from directly adjusting the imbalanced data set to the uniform one,this paper proposes the class distribution adjusting algorithm from the goodness-of-split aspect.Secondly,concerning the split changes by the class distribution alteration,we give a process to measure the influence on two class decision tree when the class distribution is adjusted.Thirdly,once the class distribution changed in training data,it causes bias from the natural distribution and thus affects the posterior estimation.We deduct a simplified form to correct the posterior probability estimation.Based on the conclusion in chapter two,this paper also improves the SMOTEBoost and EUSBoost algorithms.For SMOTEBoost,we adaptively adjust the oversampled size before each iteration.For EUSBoost,we correct the posterior estimation after each learning round.The experiments on UCI datasets verify the improvement.
Keywords/Search Tags:imbalanced data set, decision tree, goodness-of-split, posterior estimation, boosting
PDF Full Text Request
Related items