Font Size: a A A

Research And Improvement Of Decision Tree Algorithm Based On Rough Set

Posted on:2009-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:H WuFull Text:PDF
GTID:2178360272979668Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is a method used to find mode, knowledge or relationship between data. Classified data mining is the most active, most mature research direction of data mining, and classification algorithm is an important technology. In all classification algorithms, decision tree method has many advantages, such as can be easily comprehended by humans, suited for large training set and do not require additional information besides that already existed in training data, and has been widely studied and used. Information theory based traditional decision tree has many disadvantages, such as preferring to choose the attribute which has more values, has strong dependence with training set quality, and is restricted to check single attribute in each node.Therefore, in this thesis, rough set technique is introduced. Rough set is the tool used to study imprecise and uncertain knowledge, and has a strong knowledge acquisition capacity. During the study, rough set based attribute choosing criterion DiscernValue is found that it is much better than the traditional one, the dimension of decision tree can be reduced, but it has to compare all objects, so has a higher time complexity. Accordingly, in this thesis, H-important and L-important concepts are used to reduce time complexity. For reducing the dimension of decision tree further, overcoming the disadvantage of single variable decision tree, such as do not consider the relation on attributes, and multi-variable algorithm and H-important, L-important concepts are used to rebuild decision tree. Part of attribute in H-important set is choosed as original checking attribute, and the new algorithm is improved much further.Finally, the experiment is performed with Improved DiscernValue Based Decision Tree and DiscernValue Based Multi-Variable Decision Tree algorithm for comparing and analysis. The result of the experiment shows that the first algorithm mentioned before has lower time complexity, and it still has the same veracity of decision tree classification, and the second algorithm has the smaller dimension, just as the simpler decision tree, and the veracity of classification is almost the same.
Keywords/Search Tags:decision tree, rough set, Discern_Value, H-important, L-important
PDF Full Text Request
Related items