Large Scale Machine Learning: Low-rank Matrix Approximation And Online Learning

Posted on:2012-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:M Li

Full Text:PDF

GTID:2178330338484152

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The Nystr?m method is an efficient technique for the eigenvalue decompositionof large kernel matrices. However, in order to ensure an accurate approximation, asufficiently large number of columns have to be sampled. On very large data sets, theSVD step on the resultant data submatrix will soon dominate the computations and be-come prohibitive. In this paper, we propose an accurate and scalable Nystr?m schemethat first samples a large column subset from the input matrix, but then only performsan approximate SVD on the inner submatrix by using the recent randomized low-rankmatrix approximation algorithms. Theoretical analysis shows that the proposed algo-rithm is as accurate as the standard Nystr?m method that directly performs a largeSVD on the inner submatrix. On the other hand, its time complexity is only as low asperforming a small SVD. Experiments are performed on a number of large-scale datasets for low-rank approximation and spectral embedding on both CPU and GPU.Spectral clustering is an elegant and powerful approach for clustering. However,the underlying eigen-decomposition takes cubic time and quadratic space w.r.t. thedata set size. These can be reduced by the Nystr?m method. However, the manip-ulation and storage of these sampled columns can still be expensive when the dataset is large. In this paper, we propose a time- and space-efficient spectral clusteringalgorithm which can scale to very large data sets. A general procedure to orthogo-nalize the approximated eigenvectors is also proposed. Extensive spectral clusteringexperiments on a number of data sets, ranging in size from a few thousands to severalmillions, demonstrate the accuracy and scalability of the proposed approach. We fur-ther apply it to the task of image segmentation. For images with more than 10 millionspixels, this algorithm can obtain the eigenvectors in 1 minute on a single machine.Multiple instance (MI) learning is a recent learning paradigm that is more ?ex- ible than standard supervised learning algorithms in the handling of label ambiguity.It has been used in a wide range of applications including image classification, objectdetection and object tracking. Typically, MI algorithms are trained in a batch settingin which the whole training set has to be available before training starts. However, inapplications such as tracking, the classifier needs to be trained continuously as newframes arrive. Motivated by the empirical success of a batch MI algorithm calledMILES, we propose in this paper an online MI learning algorithm that has an efficientonline update procedure and also performs joint feature selection and classification asMILES. Besides, while existing online MI algorithms lack theoretical properties, weprove that the proposed online algorithm has a (cumulative) regret of , whereis the number of iterations. In other words, the average regret goes to zero asymp-totically and it thus achieves the same performance as the best solution in hindsight.Experiments on a number of MI classification and object tracking data sets demon-strate encouraging results.

Keywords/Search Tags:

Matrix Low-rank Approximation, Nystr?m, Ran-domized SVD, Spectral Clustering, GPUs, Multiple Instance Learn-ing, Online Learning, 1-norm SVM, MILES, Elastic Net, Regret

PDF Full Text Request

Related items

1	A Study Of Nystr(?)m Extensions Based On Cyclic Matrix Projections
2	Study On The Improvement And Application Of Low Rank Matrix Approximation Model
3	The Research Of Low Rank Matrix Approximation Based Machine Learning Algorithms
4	Spectral Image Recovery Based On Low-rank Approximation
5	Nonconvex Low-rank Matrix Recovery Models And Algorithms
6	Research On Improved Algorithm For Multi-instance Learning MILES
7	Research On Spectral Clustering Algorithm Based On Weighted Ensemble Nystr?m Sampling
8	Research On Online Dynamic Background Subtraction Algorithm Based On Matrix Factorization
9	A Research On Speedup Of Machine Learning Algrithom Based On Matrix Approximation
10	A Study On Algorithms Of Matrix Recovery And Their Applications Based On The Truncated Norm