Font Size: a A A

An Improved Decision Tree Algorithm Based On Density

Posted on:2017-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:B Y CaoFull Text:PDF
GTID:2348330488458202Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Classification is a hot topic in machine learning. It is a highly abstract to the dataset within known categories, further extracting rules and building predictive model, so as to classify the unlabeled sample data into specific categories.Decision tree is the most common classification algorithm. Compared to the other algorithms, decision tree is rather simple and fast with high accuracy. Furthermore, the rules constructed by decision tree can express more semantic meaning thus be interpretable. Considering that the dataset in practical always contains noise or isolated points, there will always be some leaf nodes with rarely samples. Such nodes will bring a lot of unnecessary branches, eventually give rise to the decision tree in a large scale. To solve this problem, a density-based method is proposed to construct the decision tree in this study, namely the density of the regional boundary is used as a measure in the construction process, such that the unnecessary branches do not exist in the final decision tree or only a few of them. In such a way, the scale of the decision tree is streamlined for avoiding overfitting, and the forecast accuracy is improved as well. Furthermore, this study extends the idea of density to the ensemble learning algorithm, such as RandomForest, Bagging and AdaBoost. Although these ensemble algorithms perform stronger ability of classification to the simple decision tree, the problem of containing unnecessary branches still exist. However, the density-based ensemble learning algorithm exhibits a stronger classification ability in the ensemble algorithm as well, which can not onlyreduce the scale significantly, but also can improve the classification accuracy.In this study, a number of experiments are conducted on different data sets of UCI. A detailed comparison is made in the average of tree node and the accuracy between traditional method and the proposed method. The experiments show that the proposed method can generally reduce the nodes of decision tree, it can also avoid overfitting and achieve high forecast accuracy. The classifiers constructed perform well in classification with simple structure, clearly semantic and a stronger generalization ability.
Keywords/Search Tags:Decision Tree, Ensemble Algorithm, Density, Over Fitting
PDF Full Text Request
Related items