Font Size: a A A

The Improvement Of Complete Decision Tree Based On The Information Gain Theory

Posted on:2012-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2218330362456802Subject:Spatial Information Science and Technology
Abstract/Summary:PDF Full Text Request
As an important classification method in Data Mining, Decision Tree is a method that has simple and efficient classification results. The Decision Tree constructs a model by training sample data and then classifies data by the model previously built. Research on Decision Tree has more than 40 years history and many algorithms came out these years. Some classic algorithms, such as ID3, C4.5, C5.0, are based on Information Gain theory. These methods have their own advantages like clear, simple and fast. But on the other hand, when using the Information Gain theory to divide attributes, it tends to choose the attribute which has more values. In this paper, combined with Information Gain theory's advantages, the improved decision tree algorithm is chosen to be the main research target.By introducing a new type of decision tree node, a couple of attributes would be chosen instead of a single attribute. We call the decision tree built with this type of node the Complete Decision Tree (CDT). CDT based on Information Gain retains the gain calculation selection criteria. Meanwhile, CDT improve the robust and accuracy of the algorithm. It excavates the potential of decision tree based on Information Gain.A Car Evaluation Data Set of UCI is used to test the CDT based on Information Gain. The result is compared to ID3 and C4.5. Depend on the Range parameter, CDT got a better accuracy compared to ID3 and C4.5. On the other hand, CDT made a reasonable time consuming. It proves an improvement on classification accuracy rate and sacrifices few time complexity.
Keywords/Search Tags:Information Gain, Complete Decision Tree (CDT), Attributes division, Range parameter
PDF Full Text Request
Related items