Font Size: a A A

The Optimization Algorithm Of Decision Trees Based On Genetic Algorithm

Posted on:2015-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:D D ZhangFull Text:PDF
GTID:2298330434460742Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of the network technology and database management system,the past data analysis tools and techniques are hard to meet the demand of processing themassive of data accumulated in different areas of the internal enterprise which leads to a hugewaste of the data resources. Thus, the methods of finding the useful data for the enterprisefrom the existence of the huge information and knowledge data pool becomes a new anglecaused extensive attentions. Data mining is a new technique which is to extract informationfrom a data set and transform it into an understandable structure for further use. Among them,the classification and prediction is an important data mining tasks.At present, decision tree algorithm is used as the most commonly method in the datamining classification technology as its highly accurate classification, fast processing speedand comprehensible classification rules. The performance of the decision tree mainly dependson the accuracy and complexity of the classification and prediction model.C4.5, as the classicdecision tree classification algorithm, has good nicety of grading (accuracy rate).However,because of the greedy algorithm adopted by the process of the tree construction, the structureof the decision tree often has some defects such as over fitting, too large scale etc. Geneticalgorithms categorized as global search heuristics have the potential features of Parallelismand scalability which are easy to combine with other algorithms. Thus, applying the geneticalgorithm to the decision tree classification algorithm C4.5can optimized the decision treethrough two different thinking approaches:(1) This paper has deeply analyzed the basic principle of the decision tree algorithm C4.5and summarized the shortcomings by practical cases on the balance of classification accuracyrate and scale control etc. Particularly, due to the feature of searching the global optimal thatthe genetic algorithm holds, the fourth part has directly introduced the specific optimizingmethods. Firstly, It has used C4.5algorithm to generate the initial decision tree group due toits high accuracy rate which can effectively avoid the blind search at the beginning of thegenetic algorithm. Then, it transformed and encoded the initial population of the decision treeinto corresponding sets of rules because of the no-easy code features of threes. Finally, it setthe appropriate fitness function and the genetic operations to gain the optimization of thedecision tree.(2) In fact, it needs reasonable brief of the attribute set before the classification of thedata set as the data set might exist non-effectively, irrelevant and redundancy features. Thefifth part has firstly briefed the attributes of the data set by using the global optimizationability of the genetic algorithm. Then, with the help of the rough set theory, it has reasonablestructured the fitness function to get the reduction combination classification attribute. Finally, it has used classical C4.5classification algorithms to construct the corresponding decisiontree to get the corresponding reduction decision tree.In the end, it has tested the two optimization scheme above respectively on the weatherdata set and classical UCI data set and compared the features of the decision trees by usingthe C4.5algorithm in the aspects of accuracy rate, number of rules (the number of leaf nodes),categorical attributes etc. Finally, the experimental results showed that, under certainconditions, the genetic optimized decision tree algorithm can effectively reduce the tree sizeand increase the readability of classification rules without reducing accuracy rate.
Keywords/Search Tags:Data Mining, Decision Tree, Genetic Algorithm, C4.5Algorithm
PDF Full Text Request
Related items