Font Size: a A A

Application Of Bilevel Optimization In Machine Learning

Posted on:2024-07-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:H A YinFull Text:PDF
GTID:1520307322972589Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Bilevel optimization has recently gained significant attention,particularly from the machine learning community,due to its powerful modeling potential.However,popular algorithms in practical usage often depend heavily on the lower level strong convexity and smoothness assumptions,which can be overly restrictive and limit their applicability.This thesis primarily focuses on three parts.Firstly,for a board class of hyperparameter selection problems in statistical machine learning,we turn the original bilevel optimization problem into a bilevel optimization problem whose lower problem is fully convex by decoupling its regularization term.We prove that there are solution recovery relationships for both local and global optimal solutions of these two bilevel optimization problems.Based on this transformation method,we develop a sequentially convergent Value Function-based Difference-of-Convex Algorithm with inexactness(VF-i DCA).We demonstrate that this algorithm achieves stationary solutions without the need for lower level strong convexity and smoothness assumptions.Our extensive experiments confirm our theoretical findings and indicate that the proposed VF-i DCA outperforms alternatives when tuning hyperparameters.Second,this thesis investigates the design of a class of stochastic algorithms for solving the aforementioned subproblems.We find that in the aforementioned algorithm,solving the subproblem is the main source of computational cost and the bottleneck that limits the algorithm’s application to large-scale problems.For extensive statistical machine learning problems,employing stochastic algorithms is an essential computational approach.Here we investigate a dual-based stochastic algorithm where we leverage the second-order sparse structure of the approximate Hessian matrix,substantially enhancing the computational efficiency of the stochastic algorithm.Regarding theoretical analysis,we derive the almost-sure convergence property and convergence rate in expectation of the objective value.Moreover,we present results related to the convergence rate of the distance between iteration points and the solution set under error bound conditions.Numerical results indicate that our proposed algorithm holds a competitive edge compared to existing methods.In the final part of this thesis,we design a fast gradient-based method for multiobjective bilevel optimization problems.Multi-objective bilevel optimization problems represent a novel and challenging class of problems with significant practical applications.Our algorithm does not require nested subproblem solving or estimation of the inverse of the second-order information.The most significant computational expense is the calculation of the Hessian-vector product,which can be efficiently implemented((9))computational complexity)in practice.The algorithm is primarily a combination of a gradient-based non-nested bilevel optimization algorithm and classical multi-objective optimization methods.However,we provide a novel convergence analysis and detailed numerical experiments for the algorithm.The experimental results on meta learning indicate that introducing appropriate new objectives can improve the performance of the original objective.
Keywords/Search Tags:Bilevel Optimization, Hyperparameter Selection, Stochastic Algorithm, Meta Learning
PDF Full Text Request
Related items