Stable Feature Selection Algorithm

Posted on:2015-08-12

Degree:Master

Type:Thesis

Country:China

Candidate:S S Huang

Full Text:PDF

GTID:2298330467977069

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

High-dimensional datasets are becoming more and more abundant in the field of data mining. A traditional approach of tackling the high-dimensional learning problems is based on the application of feature selection methods to select a set of features-feature models-as small as possible that accurately describe the learning examples. A common problem with most of the feature selection methods is that they often produce feature models that are not stable (or robust) with respect to slight variations in the training set. When selecting features for knowledge discovery applications, stability is a highly desired property. Therefore, this study is focus on stable feature selection algorithm.First, in supervised feature selection areas, in order to improve the stability of the algorithm, a feature weighting method based on L2-regularized logistic loss and its ensembles using two linear aggregation WEn and REn is introduced. Moreover, the detailed analysis for uniform stability and rotation invariance of the ensemble feature weighting method is presented. Additionally, some experiments were conducted using real-world microarray data sets. Results show that the proposed ensemble feature weighting methods preserved stability property while performing satisfactory classification. In most cases, at least one of them(REnå’ŒWen) actually provided better or similar tradeoff between stability and classification when compared with other methods.In unsupervised feature selection field, we present a filtering algorithm framework of the redundant features which first take advantage of group feature selection algorithm to remove the redundant features while improve stability. In order to make up the lack of a single clustering algorithm in the group formation, we introduced the idea of clustering ensemble. First use k-means clustering method to get multiple clustering. In the ensemble phase, based on co-association matrix method,use the ratio of the points that appear in the same cluster as a measure of similarity. Then use hierarchical clustering algorithm to integrate and obtain the final results.Experimental results show that the algorithm framework can eliminate the redundant and irrelevant features effectively,at the same time ensure the stability without sacrificing the classification accuracy.

Keywords/Search Tags:

Feature selection, Stability, Ensemble feature selection, Clustering ensemble

PDF Full Text Request

Related items

1	Research On Feature Selection Method Based On Feature Ensemble Clustering
2	Research On Feature Selection Method Based On Clustering Ensemble
3	Ensemble Feature Selection Algorithm
4	Online Writeprint Identification Based On Ensemble Feature Selection
5	Research And Implementation Of Ensemble Feature Selection Framework Based On Mixed Disturbance
6	The Research Of SVM-based Feature Selection And Its Ensemble Method
7	Study On Clustering Ensemble Selection Algorithm
8	Feature Selection Based Ensemble Classification And Its Application
9	Ensemble Feature Selection Based On Evidence Accumulation And Its Application In Network Traffic Analysis
10	Research On Feature Selection Based On Ensemble Learning