| Over the past decades,kernel-based methods have become very popular in machine learning,and are a commonly used approach to deal with nonparametric models.Among them,the reproducing kernel can generalize the objective function to the linear space spanned by the reproducing kernel due to its reproducing property,which makes the general nonparametric problem transformed into a “parametric” issue.This simplifies the optimization computation in nonparametric problems.This dissertation,based on the property of reproducing kernel Hilbert space(RKHS),mainly studies the following statistical problems for nonparametric models in the high-dimensional case:1.The problem of choosing the shape parameter of the general Gaussian radial basis function(RBF)kernel is studied.It is well known that the Gaussian RBF kernel is a widely used kernel in kernel-based methods.Its parameter is also called shape parameter,plays an essential role in model fitting.Most of the existing literature uses the cross-validation method combined with grid search to select the shape parameter,which is computationally expensive but still feasible and has a good performance when the shape parameter is a scalar.However,it is clearly infeasible when the shape parameter is a vector.In this dissertation,we propose a method to select the shape parameter for the general Gaussian RBF kernel,which is to vectorize the scalar shape parameter of the Gaussian RBF kernel.It can simultaneously serve for variable selection and regression function estimation.For the former,asymptotic consistency is established; for the latter,the estimation is as efficient as if the true or optimal shape parameter is known.In addition,numerical simulations and real data analysis demonstrate the superiority of the method over other popular methods for variable selection and model estimation.2.The problem of variable selection for censored data is investigated.The analysis of survival data is often hampered by problems such as too many covariates and incomplete data.Most of the existing variable selection methods for censored data are based on specific model structure assumptions,yet the particular model assumption’s veracity is crucial to their effectiveness.This dissertation considers the problem of variable selection where the response is subject to random(right)censoring.We introduce a model-free variable selection procedure via learning the gradients of quantile regression functions with two popular censoring weighting schemes.The key advantage of the proposed approach is that it does not require explicit model assumptions and significantly improve the computational efficiency of the quantile and its gradient due to the property of smooth reproducing kernel.Besides,the estimation efficiency of the quantile regression function and its gradient,and the asymptotic consistency theory of variable selection are also established in this dissertation.And the performance of the method with its finite samples in different situations is demonstrated by some numerical experiments.3.The problem of function estimation in projection pursuit regression(PPR)model is investigated.The PPR model has played an important role in statistical modeling.It can be used both as a data model for statistical interpretation and as an algorithmic model for approximating general nonparametric regression functions.Existing estimation methods of PPR model usually involve complicated minimization in order to achieve desired efficiency under general settings.This dissertation proposes an algorithm for alternating linearized estimation that simplifies the optimization problem by applying a “nearly linear” basis function so that the loss function can be represented by matrix operations.In addition,the asymptotic theory of the projection vector and its regression function estimation efficiency are considered for the case where the PPR model is used as a statistical model with fixed and diverging dimensions and as an algorithmic model.Finally,numerical performance of the proposed method in model estimation and model interpretation is demonstrated through simulations and real data analysis. |