Font Size: a A A

Stable feature selection: Theory and algorithm

Posted on:2013-09-02Degree:Ph.DType:Dissertation
University:State University of New York at BinghamtonCandidate:Han, YueFull Text:PDF
GTID:1458390008976217Subject:Computer Science
Abstract/Summary:
Feature selection plays an important role in knowledge discovery from many application domains with high-dimensional data. Many feature selection algorithms have been developed and shown successful at improving predictive accuracy of learning models while reducing feature space dimensionality and model complexity. Besides high accuracy, the stability of feature selection - the insensitivity of the result of a feature selection algorithm to variations to the training set, is another important yet under-addressed issue for feature selection. The stability issue has become increasingly critical in application domains where feature selection is used as a knowledge discovery tool to identify important features for explaining the observed phenomena, such as biomarker identification in cancer diagnosis.;In this dissertation, we present a theoretical framework about the relationship between the stability and the accuracy of feature selection based on a formal biasvariance decomposition of feature selection error. The framework also reveals the connection between stability and sample size and suggests a variance reduction approach for improving the stability of feature selection algorithms under small sample size. Following the theoretical framework, we also develop an empirical variance reduction framework and margin based instance weighting algorithms under this framework. Moreover, our extensive experimental study verifies the theoretical framework and the empirical framework based on both synthetic data sets and real-world microarray data sets. Our results show that the empirical framework is effective at reducing the variance and improving the stability of two representative feature selection algorithms, SVM-RFE and ReliefF, while maintaining comparable predictive accuracy based on the selected features. The proposed instance weighting framework is also shown to be more effective and efficient than the existing ensemble framework at improving the subset stability of the feature selection algorithms under study.
Keywords/Search Tags:Feature selection, Framework, Stability, Knowledge discovery, Application domains
Related items