Font Size: a A A

The Approach Of Constructing And Pruning Decision Tree Based On Rough Set Theory

Posted on:2006-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:M Y WangFull Text:PDF
GTID:2168360152486271Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Data Mining (i.e. Knowledge Discovery from database) is a process to mine available,credible, valid and comprehensible pattern from large-scale data in an intelligent andautomatic way. Classification is one of the most important directions in Data Mining and hasbeen widely used in many fields such as medical diagnosis, climate predict, credit validate,client distinguish, fraud discriminate and so on. Decision tree is a regular classification model. Comparing with other classificationmodels, it is concise and convenient to be transformed into rules, which are simple for peopleto understand. And the classification accuracy of decision tree is better or not worse than othermodels. Due to all these advantages, decision tree has gotten a wide application. Rough set theory proposed by Professor Z. Pawlak is an effective tool to deal withvagueness, imprecise, incomplete and uncertain data, which it has gained fruitful progressafter 20 years'development. The whole work of this dissertation is mainly on the approach of constructing andpruning decision tree based on Rough Set Theory: 1)Research on the inducing approach of decision tree. Pawlak rough set theory can't dealwell with noisy data for its property of rigorous precision, and neither does the decision treeinduction approach based on it. The tree constructed by Pawlak rough set theory is easy tooverfit the training data and can't instruct prediction effectively. This paper improved thePawlak Rough Set Theory based decision tree inducing approach according to Variableprecision rough set theory. Variable precision rough set theory permit equivalence classes tobe classified into the approximation space wrongly in some extent, which insures the treeconstructed by it can prohibit the noisy data perfectly comparing with the one constructed byPawlak rough set theory. 2)Research on the pruning approach of decision tree. It is important to prune the decisiontree for enhancing the generalize property of it. According to Vapnik's theory of minimumstructure risk, a model with good performance should split the difference between model'scomplexity and its error. Under the instruction of the theory, this paper proposes a newdecision tree pruning approach based on rough set theory. The new approach both considersthe tree's complexity and tree's classification accuracy in the pruning process, and tries tomake a balance between the two aspects. Two concepts of depth-fitting ratio and error ratiohave been proposed as the criterion for pruning.
Keywords/Search Tags:Data Mining, Decision Tree, Rough Set Theory
PDF Full Text Request
Related items