Font Size: a A A

Using genetic algorithms to develop intelligent decision trees

Posted on:2001-10-29Degree:Ph.DType:Dissertation
University:University of Maryland College ParkCandidate:Fu, ZhiweiFull Text:PDF
GTID:1468390014952724Subject:Business Administration
Abstract/Summary:
Decision tree algorithms have been widely recognized as one of the most efficient ways to find valuable patterns in large data sets found in data mining applications. However, scalability, accuracy, and performance have become major concerns in large-scale data mining with respect to decision tree algorithms. Researchers in the field of statistics and artificial intelligence have developed a number of decision tree algorithms that construct high-quality decision trees with a reasonable amount of computing effort. In this dissertation, we propose a new approach that integrates statistical sampling, genetic algorithms, and decision tree algorithms to generate intelligent decision trees. Our approach, called genetic algorithm for intelligent trees (GAIT), is a powerful algorithm that can be used to improve the quality of traditional decision tree algorithms.; GAIT is essentially a three-step approach. First, we extract many subsets of points from the original data set for analysis. Second, we construct a decision tree on each subset (we use the popular C4.5 software due to Quinlan (1992)). Third, these decision trees are taken as inputs (i.e., the initial population of trees) to our genetic algorithm. The genetic algorithm allows the trees to crossover and mutate in order to generate trees of higher quality.; In this dissertation, we develop a methodology that introduces diversity into the evolution process of GAIT. We provide insight into how diversification works for GAIT. First, we conduct experiments on socioeconomic data and actual marketing data. We find that our approach achieves the same level of classification accuracy as a standard decision tree algorithm at lower sampling levels. Regardless of the quality of the starting trees, our approach produces uniformly highly accurate decision trees.; Second, we extend GAIT to incorporate different distributions based on statistical criteria into the evolution process. We find that GAIT benefits from diversity. GAIT produces highly accurate final solutions after introducing fitness functions from different distributions (e.g., risk-prone and risk-averse distributions).; Third, we extend the statistical criteria to more general criteria and analyze the performance of GAIT. We find that our approach produces highly accurate decision trees through diversification. As benchmarks, we build logistic regression models on a marketing data set and calculate the accuracy of the results. We find that GAIT competes favorably with well-known decision tree algorithms, statistical sampling, and logistic regression on the socioeconomic data set and the marketing data set. GAIT generates highly accurate, lean decision trees in all of our computational experiments.; Finally, we conduct experiments to study the scalability of our genetic algorithms. We find that GAIT and its variants scale up reasonably well and can generate highly accurate decision trees with approximately linearly-scaled computing time in all of our computational experiments.
Keywords/Search Tags:Decision tree, Algorithms, Find that GAIT, Data, Intelligent, Experiments
Related items