Font Size: a A A

Localized Generalization Error Model with Variable Size of Neighborhoods and Applications in Ensemble Feature Selection

Posted on:2012-02-26Degree:Ph.DType:Thesis
University:Hong Kong Polytechnic University (Hong Kong)Candidate:Chan, Po FongFull Text:PDF
GTID:2458390011450945Subject:Computer Science
Abstract/Summary:
The Localized Generalization Error Model (L-GEM) provides an upper bound of the generalization error for unseen samples located in the Q-neighborhood of each training sample. It originates from the idea that expecting a classifier to recognize unseen samples in the whole input space correctly is unreasonable, as some are very different from training samples.;A crucial parameter, Q, is used to adjust the size of neighborhood of each training sample. In current literature, a single Q value is selected and is used by all training samples; thus, all training samples will have the same size of neighborhood. However, certain training samples may be extremely close to one another, and the same size of neighborhood may result in large overlapping. One of the objectives is to study the selections of different Qs for individual training samples instead of selecting a single value for all.;In view of the high computational complexity of L-GEM with variable neighborhood sizes, the second objective is to propose a new point of view by clustering data into different groups instead of a single data point. The neighborhoods are considered for each cluster, instead of each training sample.;However the trend of performance of a classifier on different sizes of neighborhoods is ignored which provide important information to evaluate the classifier. Therefore the third objective of the thesis is to propose a set of L-GEM based indices to evaluate the classifier with different sizes of neighborhood. In addition, the performance of the proposed methods in different scenarios with outlier is provided.;L-GEM has been extended into a Multiple Classifier System (MCS), and it has been shown to evaluate the generalization capability of MCS successfully. To construct an MCS, creating diverse sets of classifiers is a key issue. The ensemble feature selection varies the feature sets for each individual classifier in an MCS. Promoting diversity alone may not generate MCS with high generalization capability. Therefore, a genetic algorithm (GA) and localized generalization error model for MCS (L-GEMMCS) will be adopted to select sets of diversified feature groups for constructing an MCS with high generalization capability.
Keywords/Search Tags:Localized generalization error model, L-GEM, MCS, Feature, Neighborhood, Samples, Size, Each training sample
Related items