Font Size: a A A

Decision Tree Construction System Based On Rough Set Theory

Posted on:2009-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:G Y WangFull Text:PDF
GTID:2178360245954495Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Data Mining (i.e. Knowledge Discovery from database) is a process to mine undiscovered, available, credible, and comprehensible information and knowledge from large-scale, blurry, stochastic data in an intelligent and automatic way. Rapport pattern is constructed between data. It predicts the unknown cases to help the decisionmaker. Decision tree is a usual pattern for sorting in the person of data characteristic directly. And it is used widely in the field of data mining for its high sorting efficiency, comprehensible. Rough Set theory is a method to deal imprecise, incertitude, and imperfect information .The system of Decision Tree construction is studied and set up based on the Rough Set Theory in this paper. The system includes seven phases: data-preprocessing phase, discretization of continuous attributes, tree growing phase, tree pruning phase, decision forest phase ,tree analysis and evaluation, and the extracting rules phase. In the data-preprocessing phase the works mainly involve data cleaning to reduce noise or handle missing values. Discretization of continuous attributes, which maps continuous values into discrete values, is a very important step in data preprocessing phase in the process of data mining. It has proved that discretization of continuous attributes based on RST is a good method. In the tree-growing phase, we evaluate each attribute recursively by some attribute selection measures and choose the best split attribute and the splitting value then get a full growth decision tree. We have applied the tree growing algorithm based on Rough Set theory. In the tree pruning phase, to preventing the "over-fitting" problem and improving the accuracy rate, we must prune the tree. But how to describe the tree complicacy appropriately is a intractable problem. We define a value associated with the No. of leaves, the No. of attributes and the No. of classes in the sub-tree to solve this problem. In the rules extracting phase we can easily covert the model to the classification IF-THEN rules. When tackling the task of categorizing data records to more than two categories, traditional decision tree analysis often lacks the efficiency and accuracy needed for obtaining precise and manageable solutions. The problem originates from the fact that a single decision tree does not provide a robust mechanism for assigning records to multiple classes. The Decision Forest algorithm provides an efficient technique for solving the task of categorizing data records into multiple categorization classes. A forest construction method is proposed used different decision attribute sets.The system of decision tree construction is built by the Visual C++ program language based on the theory ahead. We have applied them on some gene data and got some satisfied results. The aim of study is to construct a compact, low error rate, comprehensible and scalable decision tree construction system.
Keywords/Search Tags:Sort, Decision tree, Rough set, Discretization, Prune, Decision forest
PDF Full Text Request
Related items