Font Size: a A A

Study On Decision Tree Classification And Pruning

Posted on:2010-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178360278466855Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a complex process that mines valuable patterns or rules from a large number of datasets, and is widely used in the field of finance, insurance, transportation and nation defense. Decision tree algorithm is one of the most widely studied and applied subjects, so research on the decision tree algorithm has high theoretic meaning and realistic value.The basic concept of the decision tree, the main researching content and several classic decision tree algorithms are introduced detailedly. Because of over fitting the training dataset and the effect of noise data during the building process, pruning operation becomes one of the most important processes for building a decision tree. So, the paper studies and compares four main pruning algorithms.The key of the dissertation is the research of multi-relational decision tree learning algorithm and the development of it. The nodes of the decision tree made by MRDTL are expressed in selection graph. Multi-relational decision tree learning (MRDTL) algorithm mines classification patterns directly from several tables instead of connecting them into one table firstly and then mining patterns. One of the important steps in MRDTL is selecting the optimization refinement through computing their information gain and then adding it which has the largest information gain to the decision tree. However, the method of computing information gain in MRDTL has a fault of missing records and this will lead to the incorrect computing result.Aiming to this problem, the dissertation develops the method of computing information gain in MRDTL. Besides that, theory analyzing and experiment demonstrating are given in the dissertation. At the same time, according to the developed computing method, the algorithm of building sufficient table is modified.
Keywords/Search Tags:classification, decision tree, multi-relational, selection graph
PDF Full Text Request
Related items