Font Size: a A A

Personal Client Segmentation Research Of Banks Based On An Improved Decision Tree Algorithm

Posted on:2012-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:G Q ZhaoFull Text:PDF
GTID:2178330335464025Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is a kind of science, which combines a lot of theory and technology from other fields. Classification algorithm based on decision tree is one of the data mining algorithm, and it is widely used in data mining because of its characteristics of simple and intuitionistic. Compared with other classification methods, classification algorithm decision tree has the following advantages:less computation, he ability of showing important decision characteristics, higher correctness of classification, the ease of getting apparent rules and, etc.Based on the study of existing data mining techniques, focus on the data sampling strategy and decision tree C4.5 algorithm. Introduce a new kid of structured sampling strategy, which use generated knowledge structure, then sample from the pre-decision tree in a more balanced way to form the target data set. Experimental results show that the new sampling method is more accurate than the random sampling method.Improve the discretize way of continuous-valued attributes of C4.5 algorithm. Change the method of threshold selecting, introducing two new variables and omitting the sequential search algorithm of the original C4.5. Experimental results show that the new C4.5 algorithm can improve the efficiency in building decision tree, save some space, and these changes will not impact on the decision tree.Meanwhile we acquire the individual credit data of a commercial bank originated from the internet, use the new sampling strategy and new C4.5 algorithm to establish the individual credit evaluation-decision tree model. From the result of classification and forecasting result of our model, we can get the conclusion that the model based on decision tree has the advantage of high accuracy, more efficiency and less space.
Keywords/Search Tags:data mining, sampling strategy, C4.5 algorithm, credit evaluation
PDF Full Text Request
Related items