Scalable machine learning for massive datasets: Fast summation algorithms

Posted on:2008-10-16

Degree:Ph.D

Type:Thesis

University:University of Maryland, College Park

Candidate:Raykar, Vikas Chandrakant

Full Text:PDF

GTID:2448390005953271

Subject:Artificial Intelligence

Abstract/Summary:

Huge data sets containing millions of training examples with a large number of attributes are relatively easy to gather. However one of the bottlenecks for successful inference is the computational complexity of machine learning algorithms. Most state-of-the-art nonparametric machine learning algorithms have a computational complexity of either O (N2) or O (N3), where N is the number of training examples. This has seriously restricted the use of massive data sets. The bottleneck computational primitive at the heart of various algorithms is the multiplication of a structured matrix with a vector, which we refer to as matrix-vector product (MVP) primitive. The goal of my thesis is to speedup up some of these MVP primitives by fast approximate algorithms that scale as O (N) and also provide high accuracy guarantees . I use ideas from computational physics, scientific computing, and computational geometry to design these algorithms. The proposed algorithms have been applied to speedup kernel density estimation, optimal bandwidth estimation, projection pursuit, Gaussian process regression, implicit surface fitting, and ranking.

Keywords/Search Tags:

Machine learning, Algorithms

Related items

1	A Study On Machine Learning Algorithms For The Document Analysis
2	Analysing Correctness Of Implementations Of Machine Learning Algorithms By Machine Learning
3	Scalable machine learning for massive datasets: Fast summation algorithms
4	Efficient Large-Scale Machine Learning Algorithms for Genomic Sequence
5	Partial Learning Machine:Concept,Algorithms And Applications
6	Evaluating the security of machine learning algorithms
7	Cognitive Exploration Of Machine Learning Algorithms
8	Optimization Algorithms for Structured Machine Learning and Image Processing Problems
9	Optimization Algorithms for Machine Learning Designed for Parallel and Distributed Environment
10	Research Of Support Vector Machine Learning Algorithms