A Comparative Study On Five Decision Tree Algorithms

Posted on:2012-12-27

Degree:Master

Type:Thesis

Country:China

Candidate:X Li

Full Text:PDF

GTID:2218330368987827

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

Decision tree classification algorithm is one of the most popular research directions in data mining technology. It has several advantages, such as less calculation, high velocity, high accuracy and comprehensive rule set. There are many classical decision trees, such as ID3. CART. C4.5 and SLIQ etc. The fuzzy decision trees are proposed in order to solve the problems of continuous attributes and uncertainties in the data sets.There are several evaluation criteria for different decision trees, such as the classification accuracy, the complexity of the tree and the method of choosing the splitting criteria etc. This paper analyzes five different decision trees on seventeen different datasets in UCI from five aspects:the classification accuracy, the complexity of the tree, the fuzzification methods of the continuous attributes for the fuzzy decision trees, the splitting criteria and the consistency of spatial partition of the sample space. We use Friedman statistical method for the comparison of classification accuracy.This paper proposes a new evaluation criterion of consistency:if a classifier is applied several times on the same dataset, will it provide similar rule sets over the different runs? This new measure can be used to decide the best classifier between different algorithms or judge the stability of one classifier which is used on the same dataset over different runs.In this paper we apply C4.5, CART, Fuzzy ID3, FS-DT and Yuan's FDT on seventeen datasets from UCI. The results show that the accuracy of Fuzzy ID3 is statistically higher than that of FS-DT, while the number of rules from CART is the least. Through the experiments we can see that the consistency we define can judge the classifiers objectively. What's more, the consistency of one classifier depends not only on the characteristic of the classifier, but also on the datasets that the classifier applied on. It shows that distribution of the samples in a data set which is dispersive will result in lower consistency.

Keywords/Search Tags:

Data Mining, Decision Trees, Attribute Selection, Partition of Sample Space

PDF Full Text Request

Related items

1	Research Of The Decision Trees And It's Application
2	A Study On Local Outliers Mining Algorithm Based On Weighted-Attribute
3	Face Attribute Recognition Based On Tree Structure
4	Research Of Network Security Audit Based On Data Mining
5	Research Of The Multivariable Decision Trees Based On Rough Set And Its Application
6	Application Of Decision Tree Of Data Mining Techniques In The Management Of College Graduates
7	Analysis And Research Of Attribute Selection Methods In Data Mining
8	Design And Implementation Of Csl Decision Support System Base On Data Warehouse And Decision Tree
9	Research And Applications Of Data Mining
10	Study Of Complex Spatial Relation Model Based On Decision Trees