Font Size: a A A

Two-way latent variable clustering

Posted on:2006-01-28Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Song, YangFull Text:PDF
GTID:1458390005994705Subject:Statistics
Abstract/Summary:
We propose a two-way latent variable clustering (LVC) method to extend finite mixture models as the basis for a general model-based clustering framework. We show how one-way latent variable clustering on subjects extends finite mixture models to allow unbalanced feature-by-subject data matrices, using an EM algorithm to find maximum likelihood estimates (MLE) of model parameters. We then develop a two-way LVC method that simultaneously clusters subjects and features. We use Gibbs sampler in the E-step and resort to stochastic versions of EM algorithms to estimate model parameters, if neither cluster structure of subjects nor that of features is known a priori. Inference of cluster structure focuses on the posterior distribution of latent variables for subjects and features given MLE of model parameters.; We then study model selection for one-way and two-way LVC models. We extend the integrated complete-data likelihood (ICL) criterion from finite mixture models to LVC models. Based on observation prediction error, we propose to use mean absolute error (MAE) and root mean square error (RMSE) on a randomly selected subset of data as two other model selection criteria.; Cluster structure of new subjects and/or new features in new data can be predicted with LVC models, which has various practical applications. We can use LVC models in classification with a priori known cluster information; to investigate cluster structure by training an LVC model on a random subset of data; and to predict missing observations based on cluster structure and apply to collaborative filtering.; In summary, we have developed a two-way latent variable clustering methodology that is applicable to a variety of problems under a coherent framework. In simulated data the method works well. In real data we show that our method is comparable to, and in some cases better than, other methods with case studies in collaborative filtering and both clustering and classification of microarray gene expression data.
Keywords/Search Tags:Cluster, Two-way latent variable, LVC, Finite mixture models, Method, Data
Related items