Font Size: a A A

Kernel Machine Learning Method And Its Application Of High-dimensional Nonadditive Logistic Semiparametric Model

Posted on:2023-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhengFull Text:PDF
GTID:2557307100477764Subject:Statistics
Abstract/Summary:PDF Full Text Request
Logistic regression model is a commonly used classification prediction model with binary outcome,which is widely used in the fields of genetics,biology and economics.With the rapid development of big data technology,real data presents high dimensionality and complex variable relationships.High-dimensional data often contain a large number of redundant predictors,which interfere with the identification of the relationship between response variables and relevant predictors in the regression model and reduce the prediction accuracy of the model.In addition,it is difficult to describe the complex nonlinear relationship between predictors and response variables and to automatically model high-order interaction effects between predictors,especially for highdimensional situations.How to describe the complex nonlinear relationship between variables,realize automatic selection of variables,and achieve accurate prediction of logistic semiparametric model is one of the hot issues in current statistical scientific research.Therefore,research on the construction and application of accurate logistic semiparametric models in the context of high-dimensional data can enrich and develop semiparametric modeling theories and methods,and also has great practical value for accurate prediction of disease risk and formulation of precise treatment plans for patients.In this thesis,we apply the “garrotized” kernel machine method to approximate nonparametric functions to describe the complex relationship between predictors and binary response variables,combined with the LASSO method to automatically select important predictors,and construct a nonadditive logistic semiparametric model based on the kernel machine.Then we propose a Penalized Logistic Garrotized Kernel Machine(PLGKM)method and develop an efficient “one-group-at-a-time” cyclic coordinate descent algorithm for fast computation of high-dimensional models.The advantage of the PLGKM method is that it can flexibly describe the complex relationship between predictors and response variables,and can automatically model the possible high-order interaction effects between predictors and eliminate irrelevant predictors in parametric and nonparametric components,improving the prediction accuracy of the model.In addition,we study the PLGKM method for generalized semiparametric models,and take poisson distribution as an example to verify the effectiveness of the penalized garrotized kernel machine method for poisson semiparametric models.We evaluate the finite sample performance of the proposed PLGKM method using simulations studies.The simulation results show that compared with the existing representative methods,the PLGKM method has better prediction performance,and compared with the LASSO method,the penalized garrotized kernel machine method for the poisson semiparametric model has higher prediction accuracy in the multivariate,high-dimensional and even ultra-high-dimensional situations.The proposed method is used for breast cancer risk prediction,and the real data analysis results show that the PLGKM method has higher prediction accuracy for breast cancer risk than the existing representative methods,which verifies the applicability of the proposed method.
Keywords/Search Tags:logistic semiparametric model, kernel machine, high-dimensional data, variable selection, LASSO
PDF Full Text Request
Related items