Font Size: a A A

The Research On Attribute Selection Of Node And Pruning Of Decision-tree

Posted on:2007-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:J F QuFull Text:PDF
GTID:2178360212455950Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The research scope of artificial intelligence(AI) is extensive. Recently the most interesting direction of AI is machine learning(ML): how make computer have study ability like men. The origin, advance and research scope of ML are discussed in the first part of this paper, and the main research object: decision-tree and its research status quo are also provided here. Subsequently the basic structure of ML and function of every part are given macroscopically. There are many perfect ML algorithms, principles of these are different, such as mechanical method, analysis method, interpretation method, induction method et al. which are listed and illustrated. The most mature method is induction algorithms, of which decision-tree algorithm is a sort. There is a classification problem: comfortable weather designed, then Find-S, List-Then-Eliminate, Candidate-Then-Eliminate and SQ that are all induction algorithms are illustrated how sort new category-unknown instance through studying training sample sets and inducing rules. Decision-tree algorithm is also induction algorithm, different from other induction algorithms that output a (set of) rule(s), its induction result is a tree: decision-tree. Non-leaf nodes of decision-tree represent attribute, its branches are values of the attribute, leaf nodes represent category. The method to identify category of new category-unknown instance is to examine attribute of node, then move down along branch according the value of the attribute of the instance. This process begins from the root of decision-tree, finishes in leaf node and finds its category. The classical algorithm of constructing decision-tree is ID3, which is an "up-to-down" method from root node. ID3 selects an attribute (according to some level) on the basis of training sample set and mounts it on node, then this node branches down according to the number of value of this attribute, simultaneity training sample set is divided into some subsets, this process continues in new node. Obviously selected attribute of a node is important and related to quality of decision-tree. ID3 use category information gain (CIG) as selection level, through converse thought, author come up with attribute information gain (AIG) as selection level. The analyses of time complexity of CIG & AIG show that much time is spent in scanning training sample set, CIG do (2n+1) times, n is the number of attribute of training sample set, but AIG do only 3 constantly, the experiment based on UCI data indicates AIG and CIG are almost same. Other problem is how to simplify complex decision-tree. After introducing many methods, this paper illuminates two pre-pruning means: PDTBS and PDTBP designed by author, other experiment based on UCI data show two algorithms can prune decision-tree to large extent on the condition of that accuracy diminish hardly.
Keywords/Search Tags:machine learning, decision-tree, attribute selection, pre-pruning
PDF Full Text Request
Related items