The Research On Attribute Selection Of Node And Pruning Of Decision-tree

Posted on:2007-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:J F Qu

Full Text:PDF

GTID:2178360212455950

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The research scope of artificial intelligence(AI) is extensive. Recently the most interesting direction of AI is machine learning(ML): how make computer have study ability like men. The origin, advance and research scope of ML are discussed in the first part of this paper, and the main research object: decision-tree and its research status quo are also provided here. Subsequently the basic structure of ML and function of every part are given macroscopically. There are many perfect ML algorithms, principles of these are different, such as mechanical method, analysis method, interpretation method, induction method et al. which are listed and illustrated. The most mature method is induction algorithms, of which decision-tree algorithm is a sort. There is a classification problem: comfortable weather designed, then Find-S, List-Then-Eliminate, Candidate-Then-Eliminate and SQ that are all induction algorithms are illustrated how sort new category-unknown instance through studying training sample sets and inducing rules. Decision-tree algorithm is also induction algorithm, different from other induction algorithms that output a (set of) rule(s), its induction result is a tree: decision-tree. Non-leaf nodes of decision-tree represent attribute, its branches are values of the attribute, leaf nodes represent category. The method to identify category of new category-unknown instance is to examine attribute of node, then move down along branch according the value of the attribute of the instance. This process begins from the root of decision-tree, finishes in leaf node and finds its category. The classical algorithm of constructing decision-tree is ID3, which is an "up-to-down" method from root node. ID3 selects an attribute (according to some level) on the basis of training sample set and mounts it on node, then this node branches down according to the number of value of this attribute, simultaneity training sample set is divided into some subsets, this process continues in new node. Obviously selected attribute of a node is important and related to quality of decision-tree. ID3 use category information gain (CIG) as selection level, through converse thought, author come up with attribute information gain (AIG) as selection level. The analyses of time complexity of CIG & AIG show that much time is spent in scanning training sample set, CIG do (2n+1) times, n is the number of attribute of training sample set, but AIG do only 3 constantly, the experiment based on UCI data indicates AIG and CIG are almost same. Other problem is how to simplify complex decision-tree. After introducing many methods, this paper illuminates two pre-pruning means: PDTBS and PDTBP designed by author, other experiment based on UCI data show two algorithms can prune decision-tree to large extent on the condition of that accuracy diminish hardly.

Keywords/Search Tags:

machine learning, decision-tree, attribute selection, pre-pruning

PDF Full Text Request

Related items

1	A Method Of Machine Learning And The Design Of Inference Engine Based On Decision Tree
2	Top-down Decision Tree Pruning Method Research
3	Network Selection Algorithms Based On Machine Learning In Heterogeneous Wireless Networks
4	Practical Decision Tree Implementation Of Machine Learning Techniques For Social Media Analysis
5	A System Of Decision Trees Construction
6	The Effect Of Attribute's Sensitivity In Decision Tree Induction
7	PAC learning a decision tree with pruning
8	Decision Tree Pruning Method Based On RST
9	Research On Network Traffic Classification Based On Machine Learning
10	Study On Decision Tree Classification And Pruning