Font Size: a A A

Variable Selection For High Dimensional Survival Data With Unknown Link Function

Posted on:2020-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:X H TaoFull Text:PDF
GTID:2404330599975279Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the development of the times and people's pursuit of health,how to accurately and efficiently predict the survival time of patients,cancer recurrence time or other types of survival data,etc.are more concerned.Especially with the development of genetic engineering,there may be hundreds or even thousands of factors affecting a disease.Therefore,selecting a factor that has a significant impact on the disease from a large number of diagnostic factors has become the focus of drug search.For how to identify studies with predictive factors,statistical methods are needed to process right censored and high-dimensional covariate data to establish an appropriate survival model to accurately predict survival time.The accelerated failure time regression model(AFT regression model)and the Cox proportional hazard regression model(Cox PH regression model)are the most common and classic models for processing survival data.Because the response variable in the AFT model directly models the covariate,it is generally considered that the AFT model has a better interpretation than the Cox PH model.The AFT model is a regression part that returns the response variable population through a known connection function,but this type of model that specifies the form of the connection function may not be able to describe the data very accurately,causing inevitable errors.It is theoretically more accurate to generate an estimated connection function from the data to build a model than to model the known connection function with data.First of all,the model of this paper is proposed.This model is improved on the basis of the AFT model.Unlike the traditional AFT model,which specifies the connection function,this paper chooses not to specify the connection function between the response variable and the covariate.The actual data is used to estimate the method of the connection function to build the model,so as to get the most suitable and accurate model.The estimation of the unknown connection function uses a kernel method to estimate it.Secondly,for the estimation of the model,in order to make the estimation result more accurate,we adopt the STUTE's weighted least squares method,which is to add the Kaplan-Meier weight to the observation of linear regression.However,this method is not very effective in the case of a high proportion of censored data,because the K-M weight of the censored survival time is "0",and the censored survival time is not directly used in the regression.In order to improve this situation,the censored restriction method proposed by T.Cai,J.Huang and L.Tian in 2008 is introduced,that is,the censored data is added as a constraint to the objective function.In addition,the LASSO penalty is used to sparse the model coefficients to achieve the goal of reducing the dimensionality reduction model.Finally,after the objective function is determined,a new algorithm based on a new LASSO algorithm and kernel estimation is used to design a new algorithm to complete the model selection and coefficient estimation.The simulation results and real data analysis show that our model has an explanatory variable that can truly affect the response variable and has higher accuracy than the AFT model.
Keywords/Search Tags:Variable selection, Lasso, censoring constraints, unknown link function, kernel, AFT model
PDF Full Text Request
Related items