Newton Methods for Large Scale Problems in Machine Learning

Posted on:2015-07-24

Degree:Ph.D

Type:Thesis

University:Northwestern University

Candidate:Hansen, Samantha Leigh

Full Text:PDF

GTID:2458390005481397

Subject:Applied Mathematics

Abstract/Summary:

The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss functions over a dataset, but respectively apply to the general scenarios of either minimizing a stochastic convex function or a convex function with an L1 regularizer. Chapter 4 discusses efficient implementations of projected Newton methods for nonnegative tensor factorization.;Chapter 2 outlines a new stochastic quasi-Newton algorithm that incorporates second order information through the L-BFGS approximation of the Hessian. The method's novel element comes from using subsampled Hessian-vector products and averaging to define the L-BFGS curvature pairs. Numerical results on a speech and text classification problem demonstrate the effectiveness of this new algorithm over stochastic gradient descent.;Chapter 3 presents a new active set method for minimizing a convex function with an L1 regularizer. The algorithm follows a two phase approach: an active-set prediction phase that employs first-order and second-order information, and a subspace phase that performs a Newton-like step using sub-sampled Newton-CG. The novelty of the algorithm comes from using an iterative shrinkage step in the active-set phase and a projected piece-wise linear line search in the subspace phase. The new algorithm is compared against a state-of-the-art orthant-wise limited memory algorithm on a speech classification problem.;The fourth chapter concerns nonnegative tensor/matrix factorization with a Kullback-Leibler objective. All presented algorithms start from an alternating block Gauss-Seidel framework and formulate each block subproblem as a sum of independent row functions that only depend on a subset of variables. Minimization of the block problem is executed by the independent minimization of each row function, which is a strictly convex function with nonnegativity constraints. The conclusion is that applying two-metric gradient projection techniques with exact or approximate Hessian information to each of the independent row functions is much more effective than applying the same algorithms directly to the block subproblem.

Keywords/Search Tags:

Algorithm, Function, Problem, New, Machine, Minimizing, Block

Related items

1	Minimizing number of sensors in wireless sensor networks for structure health monitoring system
2	Research On Parallel Machine Scheduling Problem With Preventive Maintenance And Job Release Time
3	Research And Implementation Of Visual Function Block Based On PLCopen
4	Research And Implementation Of Function Block Based On EPA
5	Design Of Function Block Based On EPA Specification And Research On Schedule Method
6	Research On Support Vector Machine Algorithm For Binary Classification Problem
7	Research On The Server Consolidation Plan For Power-minimizing
8	Research And Application Of Rotation Minimizing Frame Along 3D Curves
9	Study On Generalized Chromosome Genetic Algorithm And Iterative Least Squares Support Vector Machine Regression
10	Research On Molecular Algorithms In Block Cipher And Graph Theory