Font Size: a A A

Bayesian models for unsupervised feature selection

Posted on:2013-08-29Degree:Ph.DType:Thesis
University:Northeastern UniversityCandidate:Guan, YueFull Text:PDF
GTID:2458390008972011Subject:Computer Science
Abstract/Summary:
This dissertation focuses on developing probabilistic models for unsupervised feature selection. High-dimensional data often contain irrelevant and redundant features, which can hurt learning algorithms. One can remove these unwanted features either through removing some subsets of the original features (feature selection) or by transforming data into a lower dimensional feature space. Principal component analysis (PCA) is a popular transformation-based dimensionality reduction method. However, it is not easy to interpret which of the original features are important in PCA. We have designed sparse probabilistic PCA and mixture of sparse probabilistic PCA formulations. By presenting sparse PCA as a probabilistic Bayesian formulation, we gain the benefit of automatic model selection. We examined three different priors for achieving sparsification: (1) a two-level hierarchical prior equivalent to a Laplacian distribution and consequently to an L1 regularization, (2) an inverse-Gaussian prior, and (3) a Jeffreys prior. We learn these models by applying variational inference.;The methods in the unsupervised feature selection literature either select global or local features. Global methods select a single set of features; whereas, local methods or subspace clustering methods select subsets of features (one subset for each cluster where features in different clusters can vary). In this proposal, we provide a unified probabilistic model that can be set to perform global or local feature selection. In our preliminary work, we are able to build such a model for feature selection by tying the priors of our mixture of sparse probabilistic PCA. We propose to develop this unified model further and more directly through a Beta-Bernoulli hierarchical prior on the features by simply adjusting the variance of the Beta prior. We apply this Beta-Bernoulli prior to a Dirichlet process mixture to select features for clustering.;Finally, a single data set may be multi-faceted and can be grouped and interpreted in many different ways (we call views), especially for high dimensional data, where feature selection is typically needed. However, most clustering algorithms produce a single clustering solution; and similarly, feature selection for clustering tries to find one feature subset where one interesting clustering solution resides. In this thesis, we also propose to develop a probabilistic nonparametric Bayesian model that can discover several possible clustering solutions and the feature subset views that generated each cluster partitioning simultaneously.
Keywords/Search Tags:Feature, Model, Sparse probabilistic PCA, Bayesian, Clustering, Data
Related items