Font Size: a A A

Based On Decision Tree Classification Method

Posted on:2004-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:N DaiFull Text:PDF
GTID:2208360092485391Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Data mining, referred to as knowledge discovery in databases, is the extraction of patterns representing valuable knowledge implicitly stored in large databases or data warehouses. Classification is a form of data analysis that can be used to extract models describing important data classes. There are many techniques for data classification such as decision tree induction, Bayesian classification and Bayesian belief networks, association-based classification, genetic algorithms, rough sets, and k-nearest neighbor classifiers.This paper introduces the decision tree method for classification. Firstly, some basic algorithms for inducing decision tree are discussed, including ID3, which uses information gain to select a splitting attribute when partitioning a training set; C4.5, which can deal with numeric attributes; CART, which uses GINI rule in attribute selection and induces a binary tree; PUBLIC, which puts tree pruning in the tree building phase; Interactive method, which puts Artificial Intelligence and human-computer interaction into the procedure of decision tree induction; as well as SLIQ and SPRINT which are scalable and can be easily parallelized. Advantages and disadvantages of these algorithms are also presented. Methods for inducing decision tree in distributed database system are described and a distributed algorithm based on ID3 is proposed. Using a new data structure called attributes distribution list this algorithm can be scalable and parallelized. A decision tree classifier using a scalable ID3 algorithm is developed by Microsoft Visual C++6.0.Some actual training set has been put to test the classifier and the experiment shows that the classifier can successfully build decision trees and has good scalability.
Keywords/Search Tags:data mining, classification rules, decision tree, distributed decision tree
PDF Full Text Request
Related items