Font Size: a A A

Handling missing data in high-dimensional subspace modeling

Posted on:2013-05-03Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Balzano, Laura KathrynFull Text:PDF
GTID:2458390008965287Subject:Engineering
Abstract/Summary:
Low-dimensional linear subspace approximations to high-dimensional data are powerful enough to capture a great deal of structure in many signals, and yet they also offer simplicity and ease of analysis. Because of this, they have provided a powerful tool to many areas of engineering and science: problems of estimation, detection and prediction, with applications such as network monitoring, collaborative filtering, object tracking in computer vision, and environmental sensing.;Big data are making a big splash, with everyone from bookstores to stock brokers, hospitals to libraries, ecologists to military generals looking to capitalize on data collection opportunities. Big datasets are by definition massive, requiring computationally efficient techniques. Even more consequential is that the data quality is impossible to control. It is truly inevitable that there will be missing data and corrupted measurements. Most classical statistical techniques implicitly assume that these issues have been “cleaned” away before modeling. This fact has encouraged research development on new signal processing techniques to address these issues directly; two such problems of study have been termed “Matrix Completion” and “Robust PCA.” This thesis makes fundamental contributions in these areas and on related issues of subspace modeling.;In this thesis we show some fundamental results on how to estimate a projection residual from an incomplete data vector, and how our theory provides powerful tools for developing algorithms for subspace estimation and tracking with incomplete data. I present the algorithm GROUSE (Grassmannian Rank-One Update Subspace Estimation), a subspace tracking algorithm that performs gradient descent on the Grassmannian (the manifold of all subspaces). I present two algorithms for high rank matrix completion, a problem where columns of a matrix are incomplete and arise from a union of subspaces; the resulting matrix can be full rank. The first algorithm we present, based on incomplete projection residuals, can provably complete columns of the matrix with high probability. The second algorithm, k-GROUSE, is computationally efficient. We discuss a robust subspace tracking algorithm, GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm), which is based on the analogous l1 cost function. We also discuss an approach to the column subset selection problem with missing data.
Keywords/Search Tags:Data, Subspace
Related items