An Improved Decision Tree Algorithm Based On Density

Posted on:2017-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:B Y Cao

Full Text:PDF

GTID:2348330488458202

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

Classification is a hot topic in machine learning. It is a highly abstract to the dataset within known categories, further extracting rules and building predictive model, so as to classify the unlabeled sample data into specific categories.Decision tree is the most common classification algorithm. Compared to the other algorithms, decision tree is rather simple and fast with high accuracy. Furthermore, the rules constructed by decision tree can express more semantic meaning thus be interpretable. Considering that the dataset in practical always contains noise or isolated points, there will always be some leaf nodes with rarely samples. Such nodes will bring a lot of unnecessary branches, eventually give rise to the decision tree in a large scale. To solve this problem, a density-based method is proposed to construct the decision tree in this study, namely the density of the regional boundary is used as a measure in the construction process, such that the unnecessary branches do not exist in the final decision tree or only a few of them. In such a way, the scale of the decision tree is streamlined for avoiding overfitting, and the forecast accuracy is improved as well. Furthermore, this study extends the idea of density to the ensemble learning algorithm, such as RandomForest, Bagging and AdaBoost. Although these ensemble algorithms perform stronger ability of classification to the simple decision tree, the problem of containing unnecessary branches still exist. However, the density-based ensemble learning algorithm exhibits a stronger classification ability in the ensemble algorithm as well, which can not onlyreduce the scale significantly, but also can improve the classification accuracy.In this study, a number of experiments are conducted on different data sets of UCI. A detailed comparison is made in the average of tree node and the accuracy between traditional method and the proposed method. The experiments show that the proposed method can generally reduce the nodes of decision tree, it can also avoid overfitting and achieve high forecast accuracy. The classifiers constructed perform well in classification with simple structure, clearly semantic and a stronger generalization ability.

Keywords/Search Tags:

Decision Tree, Ensemble Algorithm, Density, Over Fitting

PDF Full Text Request

Related items

1	Application Of Ensemble Decision Tree De Based On Improved Data Protocol In Medical Decision-Making
2	Research On Decision Tree Induction And Pruning Algorithm
3	Design Of Privacy-Preserving Ensemble Decision Tree Learning In Vertically Partitioned Data
4	A Multivariate Decision Tree Classification Algorithm With Quantitative Monotonicity Constraints
5	The Axiomatic Fuzzy Sets Based Ensemble Learning Decision Tree
6	Research On The Improvement Of Decision Tree Reduced Error Pruning Algorithm
7	Research And Application On The Data Mining Algorithm Based On Decision Tree
8	Research On Decision Tree Algorithm Based On Differential Privacy Technology Protection
9	Research On Decision Tree Algorithm Based On Rough Sets And Ensemble Learning
10	Decision Tree Generation Algorithm Based On Time Series Pairs