Font Size: a A A

Exploiting smoothness in statistical learning, sequential prediction, and stochastic optimization

Posted on:2015-04-03Degree:Ph.DType:Thesis
University:Michigan State UniversityCandidate:Mahdavi, MehrdadFull Text:PDF
GTID:2478390017989955Subject:Computer Science
Abstract/Summary:
In the last several years, the intimate connection between convex optimization and learning problems, in both statistical and sequential frameworks, has shifted the focus of algorithmic machine learning to examine this interplay. In particular, on one hand, this intertwinement brings forward new challenges in reassessment of the performance of learning algorithms including generalization and regret bounds under the assumptions imposed by convexity such as analytical properties of loss functions (e.g., Lipschitzness, strong convexity, and smoothness). On the other hand, emergence of datasets of an unprecedented size, demands the development of novel and more efficient optimization algorithms to tackle large-scale learning problems.;The overarching goal of this thesis is to reassess the smoothness of loss functions in statistical learning, sequential prediction/online learning, and stochastic optimization and explicate its consequences. In particular we examine how leveraging smoothness of loss function could be beneficial or detrimental in these settings in terms of sample complexity, statistical consistency, regret analysis, and convergence rate.;In the statistical learning framework, we investigate the sample complexity of learning problems when the loss function is smooth and strongly convex and the learner is provided with the target risk as a prior knowledge. We establish that under these assumptions, by exploiting the smoothness of loss function, we are able to improve the sample complexity of learning exponentially. Furthermore, the proof of our results is constructive and is rooted in a properly designed stochastic optimization algorithm which could be of significant practical importance.;We also investigate the smoothness from the viewpoint of statistical consistency and show that in sharp contrast to optimization and generalization where the smoothness is favorable because of its computational and theoretical virtues, the smoothness of surrogate loss function might deteriorate the binary excess risk. Motivated by this negative result, we provide a unified analysis of three types of errors including optimization error, generalization bound, and the error in translating convex excess risk into a binary excess risk, and underline the conditions that smoothness might be preferred.;We then turn to elaborate the importance of smoothness in sequential prediction/online learning. We introduce a new measure to assess the performance of online learning algorithms which is referred to as gradual variation . The gradual variation is measured by the sum of the distances between every two consecutive loss functions and is more suitable for gradually evolving environments such as stock prediction. Under smoothness assumption, we devise novel algorithms for online convex optimization with regret bounded by gradual variation. The proposed algorithms can take advantage of benign sequences and at the same time protect against the adversarial sequences of loss functions.;Finally, we investigate how to exploit the smoothness of loss function in convex optimization. Unlike the optimization methods based on full gradients, the smoothness assumption was not exploited by most of the existing stochastic optimization methods. We propose a novel optimization paradigm that is referred to as mixed optimization which interpolates between stochastic and full gradient methods and is able to exploit the smoothness of loss functions to obtain faster convergence rates in stochastic optimization, and condition number independent accesses of full gradients in deterministic optimization. The key underlying insight of mixed optimization is to utilize infrequent full gradients of the objective function to progressively reduce the variance of the stochastic gradients. These results show an intricate interplay between stochastic and deterministic convex optimization to take advantages of their individual merits.;We also propose efficient projection-free optimization algorithms to tackle the computational challenge arising from the projection steps which are required at each iteration of most existing gradient based optimization methods to ensure the feasibility of intermediate solutions. In stochastic optimization setting, by introducing and leveraging smoothness, we develop novel methods which only require one projection at the final iteration. In online learning setting, we consider online convex optimization with soft constraints where the constraints are allowed to be satisfied on long term. We show that by compromising on the learner's regret, one can devise efficient online learning algorithms with sub-linear bound on both the regret and the violation of the constraints.
Keywords/Search Tags:Optimization, Smoothness, Statistical, Sequential, Online learning, Learning algorithms, Learning problems, Loss functions
Related items