With the rapid development of science and technology,the whole society has entered into a comprehensive digitalization,and the data of various industries has increased exponentially,forming massive data resources.For massive data,it is often necessary to study the relationship between covariables and some response variables.For example,in the Alzheimer’s disease research,longitudinal data,such as clinical,radiological and genetic data,are collected at different times in order to study which factors have an important impact on the time of participants converted to Alzheimer ’s disease.At the same time,due to the conversion time of Alzheimer’s disease is in a certain interval,then collects interval censored data.However,according to the current literature,there are few studies on variable screening with longitudinal data and interval censored data.Therefore,this thesis studies variable screening in the joint model of longitudinal and interval censored data,and the main research contents are as follows:Firstly,the longitudinal part of the joint model is based on the linear mixed effect model with time-dependent covariates,and the survival part is based on the relative risk model.The longitudinal part is connected with the survival part by shared trajectory function,and the loglikelihood function of the joint model is derived.Secondly,based on MCEM algorithm,maximum likelihood estimation of parameters in the joint model was obtained,and the idea of Principled Sure Independence Screening(PSIS)method is used for variable screening.Initially,the observation information matrix is deduced,and the screening statistics are established based on the observation information matrix and the estimated value of parameters,and then implement screening.Furthermore,the proposed screening method is simulated and evaluated according to the proportion of single important variable is selected,all important variables are selected,all unimportant variables are selected,and the proportion of important variables are correctly selected into the model in all replicates,and then applied to Alzheimer’s disease data to verify it’s feasibility and effectiveness.Thirdly,because of the complexity of the log-likelihood function in the above joint model,it is difficult to obtain the maximum likelihood estimation directly from the log-likelihood function by taking the derivative.Therefore,in order to solve the singularity problem of Hessian matrix,the second order Taylor expansion of the log-likelihood function in the joint model is carried out with the idea of approximation,and then the hard threshold method is used to maximize the second order Taylor expansion by referring to the idea of Feature Screening method,so as to implement variable screening.In the end,the feasibility and effectiveness of this method is verified by simulation study and the Alzheimer’s disease data analysis.Simulation results show that when the sample size is suitable,the method can screen out important variables by higher probability,and also can maintain higher screening probablility under the different interval censored rates.Finally,this thesis gives the summary and the content of future research. |