Font Size: a A A

Evolutionary optimization and ensemble techniques for data mining and pattern recognition

Posted on:2005-02-04Degree:Ph.DType:Dissertation
University:Michigan State UniversityCandidate:Topchy, Alexander PFull Text:PDF
GTID:1458390008994463Subject:Computer Science
Abstract/Summary:
This dissertation addresses fundamental data mining and pattern recognition problems---feature extraction, modeling, and data clustering---through evolutionary computation and ensemble-based approaches.; We offer feature extraction methods for improved pattern classification using genetic algorithms. New features are synthesized by merging the values of original variables during the search process. The genetic search of (sub-) optimal combinations of values is performed using a graph-based encoding of candidate solutions. A compact solution representation with minimal redundancy is used for a wide class of grouping problems, including clustering of variable values. Genetic value clustering is applied to text categorization, DNA-based assignments of individuals in population genetics and parametric learning of Bayesian network classifiers. It is shown that such feature extraction results in better predictive accuracy of classification decisions.; We develop genetic programming algorithms for modeling input-output mappings of continuous variables that incorporates dynamical fitting of free parameters of evolved models. Traditional genetic programming is extended by gradient descent optimization of leaf coefficients of tree-like programs during the evolutionary search that is made possible using algorithmic differentiation. Experimental results show significant improvement in both computational requirements and modeling accuracy for a set of symbolic regression problems.; Ensembles of partitions of data sets are studied in two respects: combination of multiple clusterings and generation of clusterings for an ensemble. We develop two efficient consensus functions for finding a combined partition of good quality. The first consensus function uses an information-theoretic principle based on maximal generalized mutual information. The second function finds a consensus clustering by estimating a probabilistic mixture model from the observed ensemble. It is demonstrated that the ensemble's partitions can be generated by weak clustering algorithms, in particular, by clustering in random low-dimensional subspaces of the original feature space. Experiments indicate that ensemble of an weak partitions can be more accurate than a single sophisticated clustering algorithm. Finally, we consider how the partition generation process can be made adaptable to provide better decisions for the patterns located near the inter-cluster boundaries.
Keywords/Search Tags:Pattern, Ensemble, Data, Evolutionary, Clustering
Related items