With the rapid development of machine learning and its wide applications,the tasks that machine learning encounters are becoming more and more complex and hard.The complexity and hardness mainly arise from that the problems are non-convex,non-differentiable,NP-hard,etc.Meanwhile,with the fast growth of data in application scenarios,the problems are not only non-convex,but also high-dimensional.Thus,re-search on theory and methods of high-dimensional non-convex optimization problems is becoming increasingly significant and urgent.Derivative-free optimization is a fam-ily of optimization methods that does not rely on the gradient,and it realizes optimiza-tion via sampling.Derivative-free optimization methods possess the merits of finding the global optima with a probability,being able to deal with the non-convex and non-differentiable objective functions,etc.This kind of methods serve as a complement for the gradient-based optimization approaches.Although derivative-free optimiza-tion methods have achieved substantial advances and successful applications,they are still restricted to the slow convergence rate and low efficiency in the high-dimensional non-convex objective functions.This bottleneck will block the further application of derivative-free optimization.Therefore,this thesis is dedicated to alleviating this bot-tleneck,and answering the key question from the theoretical perspective that which class of high-dimensional non-convex functions the derivative-free optimization meth-ods could efficiently solve.At the same time,under the theoretical guidance,this thesis is also dedicated to developing more suitable and effective optimization algorithms tai-lored to such class of high-dimensional non-convex problems,and applying them to the complex tasks in machine learning.The main results are summarized as follows.1.A classification-based derivative-free optimization framework is proposed,and its sample complexity on the non-convex functions with local H(?)lder continu-ity is disclosed.Since a unified theoretical analysis of derivative-free optimization methods on the non-convex functions still lacks,this thesis proposes a classification-based derivative-free optimization framework that could cover a wide range of derivative-free algorithms.Then,a general performance upper bound of optimizing non-convex functions under this framework is proved.The key factors affecting its optimization performance are also disclosed.On the non-convex functions with lo-cal H(?)lder continuity,the thesis shows that this classification-based derivative-free optimization framework could approximate the global optima with high quality and high probability in polynomial time.According to the theoretical discovery,the the-sis further develops an efficient randomized coordinate shrinking classifier based optimization algorithm called RACOS.The empirical results on the NP-hard prob-lem and non-convex classification task in machine learning verify the effectiveness and scalability of RACOS.2.The simultaneous optimistic optimization algorithm based on the effective di-mension for the high-dimensional non-convex problems called RESOO is pro-posed,and its convergence rate is theoretically disclosed.Since the optimistic optimization algorithm performs well on the low-dimensional non-convex prob-lems,while performs poorly on the high-dimensional ones,this thesis proposes the RESOO algorithm to scale the simultaneous optimistic optimization algorithm to the high-dimensional non-convex functions with low effective dimension.The theoretical result shows that the convergence rate of RESOO measured by the sim-ple regret only depends on the effective dimension rather than the original solu-tion space dimension.Thus,RESOO has a faster convergence rate for the high-dimensional non-convex problems with low effective dimension.The empirical results on the hyper-parameter tuning task verify the theoretical results.3.A more generic function class called the functions with optimal ε-effective di-mension is proposed,and the theoretical property when applying the random embedding to this function class is disclosed.Since in many real-world tasks the high-dimensional problems may not always exist the clear effective dimension,this thesis proposes a more generic concept called optimal ε-effective dimension.The effective dimension is its special case.For the high-dimensional non-convex objec-tive functions with low optimal ε-effective dimension,the thesis proves that at most 2ε embedding gap exists when random embedding is utilized in this function class Furthermore,the sequential random embeddings technique called SRE is proposed,and this technique could reduce the embedding gap.The empirical results on the 100,000-dimensional non-convex classification task verify the efficiency of SRE.4.A general framework based on the effective dimension for optimizing the high-dimensional multi-objective functions is proposed,and its algorithmic prop-erty is theoretically disclosed.Since many multi-objective optimization methods perform unsatisfying on the high-dimensional problems and are weak in theoretical foundations,this thesis proposes a general framework called ReMO.Via applying the random embedding,ReMO could scale any multi-objective optimization algo-rithm to the high-dimensional non-convex multi-objective functions with low ef-fective dimension.The thesis reveals the necessary and sufficient condition under which the high-dimensional multi-objective functions exist low effective dimen-sion.And it also proves that ReMO possesses the merits of optimal Pareto front preservation,time complexity reduction,and rotation perturbation invariance.The empirical results indicate that,even on the high-dimensional multi-objective func-tions where all dimensions are effective but most only have a small and bounded effect on the function value,ReMO is still effective. |