Decision Tree Construction System Based On Rough Set Theory

Posted on:2009-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:G Y Wang

Full Text:PDF

GTID:2178360245954495

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

Data Mining (i.e. Knowledge Discovery from database) is a process to mine undiscovered, available, credible, and comprehensible information and knowledge from large-scale, blurry, stochastic data in an intelligent and automatic way. Rapport pattern is constructed between data. It predicts the unknown cases to help the decisionmaker. Decision tree is a usual pattern for sorting in the person of data characteristic directly. And it is used widely in the field of data mining for its high sorting efficiency, comprehensible. Rough Set theory is a method to deal imprecise, incertitude, and imperfect information .The system of Decision Tree construction is studied and set up based on the Rough Set Theory in this paper. The system includes seven phases: data-preprocessing phase, discretization of continuous attributes, tree growing phase, tree pruning phase, decision forest phase ,tree analysis and evaluation, and the extracting rules phase. In the data-preprocessing phase the works mainly involve data cleaning to reduce noise or handle missing values. Discretization of continuous attributes, which maps continuous values into discrete values, is a very important step in data preprocessing phase in the process of data mining. It has proved that discretization of continuous attributes based on RST is a good method. In the tree-growing phase, we evaluate each attribute recursively by some attribute selection measures and choose the best split attribute and the splitting value then get a full growth decision tree. We have applied the tree growing algorithm based on Rough Set theory. In the tree pruning phase, to preventing the "over-fitting" problem and improving the accuracy rate, we must prune the tree. But how to describe the tree complicacy appropriately is a intractable problem. We define a value associated with the No. of leaves, the No. of attributes and the No. of classes in the sub-tree to solve this problem. In the rules extracting phase we can easily covert the model to the classification IF-THEN rules. When tackling the task of categorizing data records to more than two categories, traditional decision tree analysis often lacks the efficiency and accuracy needed for obtaining precise and manageable solutions. The problem originates from the fact that a single decision tree does not provide a robust mechanism for assigning records to multiple classes. The Decision Forest algorithm provides an efficient technique for solving the task of categorizing data records into multiple categorization classes. A forest construction method is proposed used different decision attribute sets.The system of decision tree construction is built by the Visual C++ program language based on the theory ahead. We have applied them on some gene data and got some satisfied results. The aim of study is to construct a compact, low error rate, comprehensible and scalable decision tree construction system.

Keywords/Search Tags:

Sort, Decision tree, Rough set, Discretization, Prune, Decision forest

PDF Full Text Request

Related items

1	Research On Decision Tree Algorithm Based On Rough Sets And Ensemble Learning
2	Research On Decision Tree Based On Rough Set Theory In Classification
3	Research Of Decision Tree Algorithms Based On Rough Set Theory
4	The Data Mining Algorithm Based On Rough Sets
5	An Algorithm Of Discretization Based On Entropy In Application Of Decision Tree
6	Attribute Reduction Based On Rough Set Theory And Research On Classification Algorithm Of Decision Tree
7	The Research Of Optimizing Algorithms Decision Tree Based On Rough Set Theory
8	Study On Coronary Heart Disease Classification By Rough Set And Decision Tree Algorithm
9	Research On Model Decision Tree Method
10	Research On Decision Rules Extraction Based On Rough Set Theory And Decision Tree