Font Size: a A A

The Research On The Algorithms Of Optimizing Decision Tree

Posted on:2010-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2178360278459095Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Decision tree is an efficient data mining method. It possesses important theoretic and practical significance for decision tree to make more improvement and raise its performance, so as to make it more suitable for the requirement of data mining technology's development. This paper deeply studies the samples selection and test attribute selection criteria problem of decision tree algorithm, mainly including the following aspects:By analyzing the principle of selecting samples which is based on multi-edit-nearest-neighbor algorithm, the multi-edit-nearest-neighbor algorithm that introduces refusing threshold is proposed. Compared with the multi-edit-nearest-neighbor algorithm, this algorithm will reduce the possibility of removing samples mistakenly by introducing refusing threshold and further decrease the risk in the judgement and the probability of misjudgement of the decision tree. Experiments comparing the two algorithms show that the refusing threshold multi-edit-nearest-neighbor algorithm is superior to multi-edit-nearest-neighbor algorithm in reducing the risk of the judgement and probability of misjudgment. However, in term of the accuracy rate between decision trees and the classification, the multi-edit-nearest-neighbor algorithm is better than the refusing threshold multi-edit-nearest-neighbor algorithm. At the same time, when they are applied to select samples, it is obvious that both of them cut down the size of decision trees without sacrificing the accuracy.A new test attribute selection criteria based on modified coefficient is presented. The main idea of the approach is to use modified coefficient to reduce information gain of attributes which have many values and great information gain. Compared with information gain and test attribute selection criteria which introduces user interest degree, this method not only overcomes the problem of tending to variety bias existing in ID3 algorithm, but also overcome problems of the subjective evaluation to the importance of the multi-valued attribute generated by user interest degree. At the same time it maintains the advantage that the decision tree algorithm does not require users to master knowledge in the field of application but only to classify unknown dates by automatically building a classifier of sample collection.A combined optimizational decision tree algorithm is proposed. The algorithm makes improvements at the two aspects of samples selection as well as test attribute selection. Besides, it also optimizes the main processes (or steps) which are easily influenced by noises and always cause variety bias problems when building a decision tree. Experiments show that the algorithm can not only reduce the size of decision tree but also improve the accuracy of classification at the same time.
Keywords/Search Tags:Data mining, decision tree, refusing threshold multi-edit-nearest-neighbor algorithm, modified coefficient
PDF Full Text Request
Related items