Font Size: a A A

Numerical Analysis And Algorithm Improvement Of Imbalanced Data Based On Decision Tree

Posted on:2019-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:L S WuFull Text:PDF
GTID:2428330545997466Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The class imbalanced problem exists widely in data science research.In this kind of problem,people tend to be more concerned about the accuracy of the minority class.The purpose of this paper is to research the influence of different factors on the decision tree,and to improve the performance of decision tree in class imbalanced problems.Through the combination of the decision tree and the K-nearest neighbor,a new algorithm LRDT(Leaf Rank Decision Tree)is constructed.The main idea of the LRDT algorithm is to sort out the majority class leaves of the decision tree according to the appropriate indicators,and improve the accuracy of the minority classes by deal with the poor performance leaves primarily.This algorithm alleviates the problem that decision tree bias towards the majority class and lead the accuracy rate of the minority classes damaged in order to ensure the overall accuracy on the imbalanced data.The accuracy of the minority class is improved and the accuracy of the whole is ensured.
Keywords/Search Tags:imbalanced data, decision tree, leaf rank, K-NN, g-mean
PDF Full Text Request
Related items