Optimal instance selection for improved decision tree

Posted on:2008-02-14

Degree:Ph.D

Type:Dissertation

University:Iowa State University

Candidate:Wu, Shuning

Full Text:PDF

GTID:1448390005973517

Subject:Engineering

Abstract/Summary:

Instance selection plays an important role in improving scalability of data mining algorithms, but it can also be used to improve the quality of the data mining results. In this dissertation we present a new optimization-based approach for instance selection that uses a genetic algorithm (GA) to select a subset of instances to produce a simpler decision tree with acceptable accuracy. The resultant trees are likely to be easier to comprehend and interpret by the decision maker and hence more useful in practice. We present numerical results for several difficult test datasets that indicate that GA-based instance selection can often reduce the size of the decision tree by an order of magnitude while still maintaining good prediction accuracy. The results suggest that GA-based instance selection works best for low entropy datasets. With higher entropy, there will be less benefit from instance selection. A comparison between GA and other heuristic approaches such as Rmhc (Random Mutation Hill Climbing) and simple construction heuristic, indicates that GA is able to obtain a good solution with low computation cost even for some large datasets. One advantage of instance selection is that it is able to increase the average instances associated with the leaves of the decision trees to avoid overfitting, thus instance selection can be used as an effective alternative to prune decision trees. Finally, the analysis on the selected instances reveals that instance selection helps to reduce outliers, reduce missing values, and select the most useful instances for separating classes.

Keywords/Search Tags:

Instance selection, Decision tree, Data mining

Related items

1	Research And Application On Decision Tree In Data Mining
2	Application Of Ensemble Decision Tree De Based On Improved Data Protocol In Medical Decision-Making
3	The Research Of Decision Tree Algorithm In Data Mining
4	Instance selection for simplified decision trees through the generation and selection of instance candidate subsets
5	Feature Selection Based On K-anonymity And Decision Tree Integrated Privacy Protection
6	A Comparative Study On Five Decision Tree Algorithms
7	Research And Design Of Arbitrage Stock Selection Model Based On Data Mining
8	Freight Invoice Based On Decision Tree Data Mining System
9	The Research Of Data Mining In Mobile Communication Enterprise Based On Decision Tree
10	A Study Of Optimizing Data Mining Algorithms Based On Decision Tree