The Research On The Algorithms Of Optimizing Decision Tree

Posted on:2010-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:Q Li

Full Text:PDF

GTID:2178360278459095

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Decision tree is an efficient data mining method. It possesses important theoretic and practical significance for decision tree to make more improvement and raise its performance, so as to make it more suitable for the requirement of data mining technology's development. This paper deeply studies the samples selection and test attribute selection criteria problem of decision tree algorithm, mainly including the following aspects:By analyzing the principle of selecting samples which is based on multi-edit-nearest-neighbor algorithm, the multi-edit-nearest-neighbor algorithm that introduces refusing threshold is proposed. Compared with the multi-edit-nearest-neighbor algorithm, this algorithm will reduce the possibility of removing samples mistakenly by introducing refusing threshold and further decrease the risk in the judgement and the probability of misjudgement of the decision tree. Experiments comparing the two algorithms show that the refusing threshold multi-edit-nearest-neighbor algorithm is superior to multi-edit-nearest-neighbor algorithm in reducing the risk of the judgement and probability of misjudgment. However, in term of the accuracy rate between decision trees and the classification, the multi-edit-nearest-neighbor algorithm is better than the refusing threshold multi-edit-nearest-neighbor algorithm. At the same time, when they are applied to select samples, it is obvious that both of them cut down the size of decision trees without sacrificing the accuracy.A new test attribute selection criteria based on modified coefficient is presented. The main idea of the approach is to use modified coefficient to reduce information gain of attributes which have many values and great information gain. Compared with information gain and test attribute selection criteria which introduces user interest degree, this method not only overcomes the problem of tending to variety bias existing in ID3 algorithm, but also overcome problems of the subjective evaluation to the importance of the multi-valued attribute generated by user interest degree. At the same time it maintains the advantage that the decision tree algorithm does not require users to master knowledge in the field of application but only to classify unknown dates by automatically building a classifier of sample collection.A combined optimizational decision tree algorithm is proposed. The algorithm makes improvements at the two aspects of samples selection as well as test attribute selection. Besides, it also optimizes the main processes (or steps) which are easily influenced by noises and always cause variety bias problems when building a decision tree. Experiments show that the algorithm can not only reduce the size of decision tree but also improve the accuracy of classification at the same time.

Keywords/Search Tags:

Data mining, decision tree, refusing threshold multi-edit-nearest-neighbor algorithm, modified coefficient

PDF Full Text Request

Related items

1	Research On The Application Of Data Mining Technology In The Analysis And Prediction Of Achievements In Colleges And Universities
2	Research On Nearest Neighbor Method Based On Sparse Representation And Decision Tree
3	Research On K Nearest Neighbor Algorithm Based On Class Division And Neighbor Selection
4	Research On Multi-label Learning Algorithm Based On K Nearest Neighbor
5	Human Behavior Recognition Based On Acceleration Sensor Placed Under Foot
6	Application Research Of Decision Tree Algorithm In The Student Employment Management
7	The Application Of Data Mining Methods In Credit Card Default Prediction
8	Grid-based Clustering Algorithm Analysis And Research
9	Studies On Probabilistic Nearest Neighbor Queries Over Uncertain Data
10	Research On Continuous Nearest Neighbor Query