Font Size: a A A

Multivariate decision trees for data mining

Posted on:2000-04-25Degree:Ph.DType:Dissertation
University:University of South CarolinaCandidate:Li, XiaobaiFull Text:PDF
GTID:1468390014463466Subject:Computer Science
Abstract/Summary:
Data mining is the process of discovering hidden patterns in large databases. In this dissertation, we present a comprehensive overview of data mining concepts, processes, applications and related fields. We identify the primary tasks of data mining, and discuss some popular data mining methods, algorithms and commercial products.; The main focus of this dissertation is on decision trees, one of the most popular data mining techniques. Following a thorough discussion of decision tree techniques, we present a set of new decision tree algorithms. The heart of these algorithms is a linear discriminant function (LDF) based splitting method and a dynamic programming based pruning (DPP) procedure. The LDF algorithm incorporates a traditional multivariate statistical method into decision trees' recursive partitioning process; it has great potential to improve the quality of decision trees. The DPP algorithm applies an optimization technique to decision trees' pruning procedure, generating a sequence of pruned trees that are optimal with respect to tree size. A decision tree system has been developed based on the proposed algorithms, and it can be readily implemented for solving classification problems in data mining.; The results of our experimental study indicate that (1) the proposed LDF/DPP algorithm appears to outperform some of the major decision tree algorithms in classification accuracy; and (2) the proposed algorithm typically generates decision trees whose sizes are significantly smaller than those produced by existing decision tree algorithms.; A case study has been conducted, where the proposed decision tree system, together with other data mining tools, was applied to a real world problem of customer retention. Preliminary results of the study show that the company can benefit by implementing the proposed data mining techniques.
Keywords/Search Tags:Data mining, Decision tree, Proposed
Related items