Font Size: a A A

Topics in High-dimensional Unsupervised Learnin

Posted on:2012-04-05Degree:Ph.DType:Dissertation
University:University of MichiganCandidate:Guo, JianFull Text:PDF
GTID:1460390011470193Subject:Statistics
Abstract/Summary:
The first part of the dissertation introduces several new methods for estimating the structure of graphical models. Firstly, we consider estimating graphical models with discrete variables, including nominal variables and ordinal variables. For the nominal variables, we prove the asymptotic properties of the joint neighborhood selection method proposed by Hoefling and Tibshirani (2009) and Wang et al. (2009), which is used to fit high-dimensional graphical models with binary random variables. We show that this method is consistent in terms of both parameter estimation and structure estimation and extend it to general nominal variables. For ordinal variables, we introduce a new graphical model, which assumes that the ordinal variables are generated by discretizing marginal distributions of a latent multivariate Gaussian distribution and the relationships of these ordinal variables are described by the underlying Gaussian graphical model. We develop an EM-like algorithm to estimate the underlying latent network and apply the mean field theory to improve computational efficiency.;We also consider the problem of jointly estimating multiple graphical models which share the variables but come from different categories. Compared with separate estimation for each category, the proposed joint estimation method significantly improves performance when graphical models in different categories have some similarities. We develop joint estimation methods both for Gaussian graphical models and for graphical models for categorical variables.;In the second part of the dissertation, we develop two methods to improve interpretability of high-dimensional unsupervised learning methods. First, we introduce a pairwise variable selection method for high-dimensional model-based clustering. Unlike existing variable selection methods for clustering problems, the proposed method not only selects the informative variables, but also identifies which pairs of clusters are separable by each informative variable. We also propose a new method to identify both sparse structures and "block" structures in factor loadings in principal component analysis. This is achieved by forcing highly correlated variables to have identical factor loadings via a regularization penalty.
Keywords/Search Tags:Graphical models, Variables, Method, High-dimensional
Related items