Topics in matrix sampling algorithms

Posted on:2012-10-11

Degree:Ph.D

Type:Dissertation

University:Rensselaer Polytechnic Institute

Candidate:Boutsidis, Christos

Full Text:PDF

GTID:1450390008998759

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

We study three fundamental problems of Linear Algebra, lying in the heart of various Machine Learning applications, namely: (i) Low-rank Column-based Matrix Approximation, (ii) Coreset Construction in Least-Squares Regression, and (iii) Feature Selection in k-means Clustering. A high level description of these problems is as follows: given a matrix A and an integer r, what are the r most "important" columns (or rows) in A? A more detailed description is given momentarily.;1. Low-rank Column-based Matrix Approximation. We are given a matrix A and a target rank k. The goal is to select a subset of columns of A and, by using only these columns, compute a rank k approximation to A that is as good as the rank k approximation that would have been obtained by using all the columns.;2. Coreset Construction in Least-Squares Regression. We are given a matrix A and a vector b. Consider the (over-constrained) least-squares problem of minimizing ||Ax - b||2, over all vectors x epsilon ¸ D . The domain D represents the constraints on the solution and can be arbitrary. The goal is to select a subset of the rows of A and b and, by using only these rows, find a solution vector that is as good as the solution vector that would have been obtained by using all the rows.;3. Feature Selection in K-means Clustering. We are given a set of points described with respect to a large number of features. The goal is to select a subset of the features and, by using only this subset, obtain a k-partition of the points that is as good as the partition that would have been obtained by using all the features.;We present novel algorithms for all three problems mentioned above. Our results can be viewed as follow-up research to a line of work known as "Matrix Sampling Algorithms". Frieze et al [59] presented the first such algorithm for the Low-rank Matrix Approximation problem. Since then, such algorithms have been developed for several other problems, e.g. Regression [47], Graph Sparsification [131], and Linear Equation Solving [128]. Our contributions to this line of research are: (i) improved algorithms for Low-rank Matrix Approximation and Regression (ii) algorithms for a new problem domain ( K-means Clustering).

Keywords/Search Tags:

Matrix, Algorithms, Low-rank, K-means clustering, Problem, Regression

PDF Full Text Request

Related items

1	Subspace Segmentation By K-means Clustering On Low Rank Representations
2	Studies On New Fuzzy Clustering Algorithms
3	Research On Algorithms For Low Rank Matrix Optimization Problem
4	A Study Of Huber Matrix Regression:Theory And Algorithms
5	The LADMM Method For Block And Low-rank Matrix Regression
6	Sparse Reduced Rank Regression By Dividing The Optimization
7	Alternating Direction Method Of Multipliers To Solve K-means Clustering Problem
8	The Research Of Fusion Algorithms For Support Vector Machine And K-means Clustering
9	Study On Clustering Algorithms
10	Low-rank Approximation Of Integer Matrix And Its Applications