Stable feature selection: Theory and algorithm

Posted on:2013-09-02

Degree:Ph.D

Type:Dissertation

University:State University of New York at Binghamton

Candidate:Han, Yue

Full Text:PDF

GTID:1458390008976217

Subject:Computer Science

Abstract/Summary:

Feature selection plays an important role in knowledge discovery from many application domains with high-dimensional data. Many feature selection algorithms have been developed and shown successful at improving predictive accuracy of learning models while reducing feature space dimensionality and model complexity. Besides high accuracy, the stability of feature selection - the insensitivity of the result of a feature selection algorithm to variations to the training set, is another important yet under-addressed issue for feature selection. The stability issue has become increasingly critical in application domains where feature selection is used as a knowledge discovery tool to identify important features for explaining the observed phenomena, such as biomarker identification in cancer diagnosis.;In this dissertation, we present a theoretical framework about the relationship between the stability and the accuracy of feature selection based on a formal biasvariance decomposition of feature selection error. The framework also reveals the connection between stability and sample size and suggests a variance reduction approach for improving the stability of feature selection algorithms under small sample size. Following the theoretical framework, we also develop an empirical variance reduction framework and margin based instance weighting algorithms under this framework. Moreover, our extensive experimental study verifies the theoretical framework and the empirical framework based on both synthetic data sets and real-world microarray data sets. Our results show that the empirical framework is effective at reducing the variance and improving the stability of two representative feature selection algorithms, SVM-RFE and ReliefF, while maintaining comparable predictive accuracy based on the selected features. The proposed instance weighting framework is also shown to be more effective and efficient than the existing ensemble framework at improving the subset stability of the feature selection algorithms under study.

Keywords/Search Tags:

Feature selection, Framework, Stability, Knowledge discovery, Application domains

Related items

1	A graph analytics framework for knowledge discovery
2	Feature construction, selection and consolidation for knowledge discovery
3	Study On Stability Domains And Its Application In Electricity Market Stability
4	Research On Structural Methods Of Knowledge Discovery
5	Granular Computing Based Knowledge Discovery And Its Applications
6	Feature selection for robust knowledge discovery from data
7	A Framework for Information Retrieval and Knowledge Discovery from Online Healthcare Forums
8	Research On Rough Set Theory In Knowledge Discovery
9	Automated knowledge discovery from functional magnetic resonance images using spatial coherence
10	Research And Implementation Of A Framework For Communication Dataset Knowledge Discovery