Extensions of Principal Components Analysis

Posted on:2010-02-27

Degree:Ph.D

Type:Thesis

University:Georgia Institute of Technology

Candidate:Brubaker, S. Charles

Full Text:PDF

GTID:2448390002976503

Subject:Mathematics

Abstract/Summary:

Principal Components Analysis is a standard tool in data analysis, widely used in data-rich fields such as computer vision, data mining, bioinformatics, and econometrics. For a set of vectors in Rn number k < n, the method returns a subspace of dimension k whose average squared distance to that set is as small as possible. Besides saving computation by reducing the dimension, projecting to this subspace can often reveal structure that was hidden in high dimension.;This thesis considers several novel extensions of PCA, which provably reveals hidden structure where standard PCA fails to do so. First, we consider Robust PCA, which prevents a few points, possibly corrupted by an adversary, from having a large effect on the analysis. The key idea is to alternate noise removal with projection to a constant fraction of the dimensions. When applied to noisy mixture models, the algorithm finds a subspace that is close to the pair of means that are furthest apart. By choosing and testing random directions in this subspace, the algorithm finds a partitioning hyperplane that does not cut any component and then recurses on the two resulting halfspaces. This strategy yields a learning algorithm for noisy mixtures of log-concave distributions that is only slightly weaker than the noiseless result (Chap. 5).;Second, we consider Isotropic PCA, which can go beyond the first two moments in identifying "interesting" directions in data. The algorithm first makes the distribution isotropic through an affine transformation. Then the algorithm reweights the data and computes the resulting first and second moments. In effect, this simulates a non-isotropic distribution, whose moments are sometimes meaningful. In the case of a mixture of Gaussians under a Gaussian reweighting, either the first moment or the direction of maximum second moment can be used to partition the components of the mixture assuming that the components are sufficiently separated. This strategy leads to the first affine-invariant algorithm that can provably learn mixtures of Gaussians in high dimensions, improving significantly on known results (Chap. 6).;Thirdly, we define the "Subgraph Parity Tensor" of order r of a graph and reduce the problem of finding planted cliques in random graphs to the problem of finding the top principal component of this tensor (Chapter 7). This extends work by Frieze and Kannan, which considers only third order tensors. The intuition behind the result is that the entries in the block of the tensor corresponding to the clique will all have the same values, while the values in other blocks will be uncorrelated. This forces the top principal component of the tensor to "point" towards the clique. Using a previously known algorithm, the clique can be recovered.

Keywords/Search Tags:

Principal, Component, Algorithm, Data, PCA, Tensor

Related items

1	Fault Detection And Identification For Tensor Data Based On Multilinear Principal Component Analysis
2	Research On Theory And Applications Of Sparsity Reconstruction Of High-Dimension Data From Quantized Measurements
3	Extensions of Principal Components Analysis
4	Dimension Reduction Algorithm Based On NSST System Transform And Two Order Tensor Principal Component Analysis
5	Research On Block Robust Tensor Principal Component Analysis Algorithm Based On Distance Measurement Learning
6	Application Of Tensor Principal Component Analysis Algorithm In Brain Medical Images
7	Research On Tensor Robust Principal Component Algorithm
8	Research On Theory And Applications Of Low-Tubal-Rank Tensor Recovery
9	Research And Application On Human Gait Recognition Based On Tensor And Deep Learning Of Visual Data
10	The Research Of Tensor Robust Principal Component Analysis Based On Truncated Nuclear Norm And Column/row Space Sampling