Font Size: a A A

Optimal algorithms for L1-norm Principal Component Analysis: New tools for signal processing and machine learning with few and/or faulty training data

Posted on:2016-01-01Degree:Ph.DType:Dissertation
University:State University of New York at BuffaloCandidate:Markopoulos, PanagiotisFull Text:PDF
GTID:1478390017982232Subject:Electrical engineering
Abstract/Summary:
The objective of this work is the development of a solid theoretical and algorithmic framework for outlier-resistant L1-norm Principal Component Analysis (PCA). PCA is the statistical data analysis technique that has been, for over a century, the "mainstay" of modern signal processing and machine learning, with numerous important applications in wireless communications, computer networks, computer vision, image processing, and bio-informatics, to name a few. However, researchers have long observed that standard L2-norm-based PCA is highly responsive to corrupted, highly deviating, irregular data-points (outliers) in the data, even when they appear in a vanishingly small numbers. Because of the frequent appearance of such outliers in real-world applications and the sensitivity of L2-norm Principal Components (PCs), a great amount of research effort has been placed in the past few decades on calculating and using instead PCs that define L1-norm maximum-projection data subspaces (L1-PCA).;A summary of our contributions in this manuscript follows. In Chapter 1, we translate L1-PCA into combinatorial optimization and deliver the first two optimal algorithms in the literature for its exact calculation. In Chapter 2, we propose a third, efficient L 1-PCA algorithm (complexity close to that of standard L 2-PCA) that attains optimal performance with empirical probability close to 1, outperforming on average all counterparts of comparable computational cost existing in the literature. This algorithm was designed to bridge the gap between our optimal algorithms of high computational cost and the existing low-cost suboptimal algorithms of high performance degradation and, thus, be the method of choice for real-world L1-PCA of large data. In Chapter 3, we focus on the special case of real, non-negative matrices (e.g., images, graph adjacency matrices) and calculate their optimal L1-PC with linear cost. Then, we present a novel L1-PCA-based technique for the recovery of an image from a set of few, possibly severely corrupted copies. In Chapter 4, we employ our L1-PCA tools from Chapters 1 and 2 for developing a state-of-the-art subspace-based direction-of-arrival (DoA) estimation method, that is capable of attaining performance similar to that of the highly popular L2-subspace methods in nominal system operation, while exhibiting inherent resistance against unknown data record contamination. In Chapter 5, we steer our focus from real to complex data analysis and define, for the first time, the L1-PC of complex data. Then, we present an algorithm for calculating the L 1-PC of a complex data matrix and use it to devise a novel outlier-resistant subspace-based DoA estimation method. Finally, in Chapter 6, we establish the MMSE operation for Pseudonoise (PN) masked data in the form of a time varying linear filter, suggest an implementation that avoids repeated input autocorrelation matrix inversion, and develop an auxiliary-vector (AV) MMSE filter estimator with state-of-the-art short-data-record estimation performance.
Keywords/Search Tags:Data, Optimal algorithms, L1-norm, Principal, Processing, L1-PCA, Performance
Related items