Font Size: a A A

A Comparative Study On Five Decision Tree Algorithms

Posted on:2012-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2218330368987827Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Decision tree classification algorithm is one of the most popular research directions in data mining technology. It has several advantages, such as less calculation, high velocity, high accuracy and comprehensive rule set. There are many classical decision trees, such as ID3. CART. C4.5 and SLIQ etc. The fuzzy decision trees are proposed in order to solve the problems of continuous attributes and uncertainties in the data sets.There are several evaluation criteria for different decision trees, such as the classification accuracy, the complexity of the tree and the method of choosing the splitting criteria etc. This paper analyzes five different decision trees on seventeen different datasets in UCI from five aspects:the classification accuracy, the complexity of the tree, the fuzzification methods of the continuous attributes for the fuzzy decision trees, the splitting criteria and the consistency of spatial partition of the sample space. We use Friedman statistical method for the comparison of classification accuracy.This paper proposes a new evaluation criterion of consistency:if a classifier is applied several times on the same dataset, will it provide similar rule sets over the different runs? This new measure can be used to decide the best classifier between different algorithms or judge the stability of one classifier which is used on the same dataset over different runs.In this paper we apply C4.5, CART, Fuzzy ID3, FS-DT and Yuan's FDT on seventeen datasets from UCI. The results show that the accuracy of Fuzzy ID3 is statistically higher than that of FS-DT, while the number of rules from CART is the least. Through the experiments we can see that the consistency we define can judge the classifiers objectively. What's more, the consistency of one classifier depends not only on the characteristic of the classifier, but also on the datasets that the classifier applied on. It shows that distribution of the samples in a data set which is dispersive will result in lower consistency.
Keywords/Search Tags:Data Mining, Decision Trees, Attribute Selection, Partition of Sample Space
PDF Full Text Request
Related items