Handling missing data in high-dimensional subspace modeling

Posted on:2013-05-03

Degree:Ph.D

Type:Thesis

University:The University of Wisconsin - Madison

Candidate:Balzano, Laura Kathryn

Full Text:PDF

GTID:2458390008965287

Subject:Engineering

Abstract/Summary:

Low-dimensional linear subspace approximations to high-dimensional data are powerful enough to capture a great deal of structure in many signals, and yet they also offer simplicity and ease of analysis. Because of this, they have provided a powerful tool to many areas of engineering and science: problems of estimation, detection and prediction, with applications such as network monitoring, collaborative filtering, object tracking in computer vision, and environmental sensing.;Big data are making a big splash, with everyone from bookstores to stock brokers, hospitals to libraries, ecologists to military generals looking to capitalize on data collection opportunities. Big datasets are by definition massive, requiring computationally efficient techniques. Even more consequential is that the data quality is impossible to control. It is truly inevitable that there will be missing data and corrupted measurements. Most classical statistical techniques implicitly assume that these issues have been “cleaned” away before modeling. This fact has encouraged research development on new signal processing techniques to address these issues directly; two such problems of study have been termed “Matrix Completion” and “Robust PCA.” This thesis makes fundamental contributions in these areas and on related issues of subspace modeling.;In this thesis we show some fundamental results on how to estimate a projection residual from an incomplete data vector, and how our theory provides powerful tools for developing algorithms for subspace estimation and tracking with incomplete data. I present the algorithm GROUSE (Grassmannian Rank-One Update Subspace Estimation), a subspace tracking algorithm that performs gradient descent on the Grassmannian (the manifold of all subspaces). I present two algorithms for high rank matrix completion, a problem where columns of a matrix are incomplete and arise from a union of subspaces; the resulting matrix can be full rank. The first algorithm we present, based on incomplete projection residuals, can provably complete columns of the matrix with high probability. The second algorithm, k-GROUSE, is computationally efficient. We discuss a robust subspace tracking algorithm, GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm), which is based on the analogous l1 cost function. We also discuss an approach to the column subset selection problem with missing data.

Keywords/Search Tags:

Data, Subspace

Related items

1	Research On Data Mining Algorithm Based On Low-rank Sparse Subspace
2	Subspace Learning Via Random Sample Probing And Robust PCA And Its Application In Video Processing
3	The Technology And Applications Of Subspace Learning
4	Estimation of subspace arrangements: Its algebra and statistics
5	Subspace Clustering Method Of High Dimensional Data
6	Research On Fast Algorithm For Subspace Segmentation In The Big Data Era
7	Subspace Clustering Of Image Data
8	Study On Outlier Detection In Subspace
9	Research On Algorithms For Subspace Clustering And Outlier Mining Based-on Information-entropy
10	The Research On Common Subspace Recognition Method For High Dimensional Data