Font Size: a A A

Research On Feature Selection Algorithms Based On Pairwise Constraints And Sparse Representation

Posted on:2011-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:D SunFull Text:PDF
GTID:2178330338476299Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Feature selection, as an effective method in dimensionality reduction of high-dimensional data, has been widely used in many practical applications such as text categorization, information retrieval and genetic analysis. Most of traditional feature selection methods perform on labeled data or unlabeled data. However, besides class labels, there exists another form of supervision information for feature selection, i.e., pairwise constraints which specify whether a pair of instances belongs to the same class (must-link constraint) or different classes (cannot-link constraint). Pairwise constraints, which can be more easily obtained than class labels, have been applied in many areas of machine learning. Therefore, we firstly focus on studying feature selection algorithms with pairwise constraints. On the other hand, because of the good characteristics of sparse representations, it has attracted wide attentions of researchers in the field of machine learning. We impose the notion of sparse representation to feature selection methods and propose a novel feature selection algorithm. The main contributions of this thesis are summarized as follows:Firstly, we study Constraint Score which is a feature selection algorithm based on pairwise constraints. By integrating the concept of semi-supervised dimensionality reduction and adding the global or local information, a novel semi-supervised feature selection algorithm called Semi-CS is proposed. Semi-CS can make use of both unlabeled data and pairwise constraints for feature selection and has good performances on several UCI high-dimensionality datasets.Secondly, for the shortcoming of Constraint Score which dependents on the composition of constraints set, we propose an ensemble feature selection algorithm called BCS. BCS makes use of pairwise constraints from the ensemble perspective and effectively improves the classification and clustering performances on several UCI high-dimensionality datasets and gene expression databases.Finally, based on sparse representation, we propose a feature selection algorithm called Sparsity Score. Sparsity Score aims to preserve the sparse reconstructive relationship of the data. Extensive experiments on several UCI high-dimensionality datasets and gene expression databases validate the effectiveness of the proposed algorithm.
Keywords/Search Tags:feature selection, pairwise constraints, ensemble learning, sparse representation, semi-supervised dimensionality reduction
PDF Full Text Request
Related items