Evolutionary optimization and ensemble techniques for data mining and pattern recognition

Posted on:2005-02-04

Degree:Ph.D

Type:Dissertation

University:Michigan State University

Candidate:Topchy, Alexander P

Full Text:PDF

GTID:1458390008994463

Subject:Computer Science

Abstract/Summary:

This dissertation addresses fundamental data mining and pattern recognition problems---feature extraction, modeling, and data clustering---through evolutionary computation and ensemble-based approaches.; We offer feature extraction methods for improved pattern classification using genetic algorithms. New features are synthesized by merging the values of original variables during the search process. The genetic search of (sub-) optimal combinations of values is performed using a graph-based encoding of candidate solutions. A compact solution representation with minimal redundancy is used for a wide class of grouping problems, including clustering of variable values. Genetic value clustering is applied to text categorization, DNA-based assignments of individuals in population genetics and parametric learning of Bayesian network classifiers. It is shown that such feature extraction results in better predictive accuracy of classification decisions.; We develop genetic programming algorithms for modeling input-output mappings of continuous variables that incorporates dynamical fitting of free parameters of evolved models. Traditional genetic programming is extended by gradient descent optimization of leaf coefficients of tree-like programs during the evolutionary search that is made possible using algorithmic differentiation. Experimental results show significant improvement in both computational requirements and modeling accuracy for a set of symbolic regression problems.; Ensembles of partitions of data sets are studied in two respects: combination of multiple clusterings and generation of clusterings for an ensemble. We develop two efficient consensus functions for finding a combined partition of good quality. The first consensus function uses an information-theoretic principle based on maximal generalized mutual information. The second function finds a consensus clustering by estimating a probabilistic mixture model from the observed ensemble. It is demonstrated that the ensemble's partitions can be generated by weak clustering algorithms, in particular, by clustering in random low-dimensional subspaces of the original feature space. Experiments indicate that ensemble of an weak partitions can be more accurate than a single sophisticated clustering algorithm. Finally, we consider how the partition generation process can be made adaptable to provide better decisions for the patterns located near the inter-cluster boundaries.

Keywords/Search Tags:

Pattern, Ensemble, Data, Evolutionary, Clustering

Related items

1	Research On Key Technologies Of Clustering Ensemble
2	Adaptive Semi-supervised Clustering Ensemble For High Dimensional Data
3	Research And Application Of Evolutionary Clustering Algorithm
4	Research On The Effectiveness Element Theory And Method Of Clustering Ensemble
5	Research On Ensemble Clustering Algorithms For Complex Data
6	Study On Classifiers Ensemble Based On Evolutionary Computation And Fuzzy Clustering
7	Clustering Ensemble Algorithm Based On Mixed Data Representation
8	Research On Clustering Ensemble Of Mixed Data And Clustering Algorithm Of Mixed Data Streams
9	Integrated Clustering Algorithms And Applied Research
10	Research On Ensemble-Initialized K-Means Clustering Algorithms