| Interval-censored data is a special type of failure time data that is widely used in medical,demographic and economic,etc.(Sun,2006).Interval-censored failure time data arise when the failure time of interest is known or observed only to belong to some intervals instead of being observed exactly.Interval-censored data are generally classified into case interval-censored data and case interval censored data.By case interval-censored data,also known as current status data,we usually mean that each study subject is observed only once and the failure time of interest is known only to be either smaller or larger than the observation time.By case Interval-censored data,the event of interest occurs in a finite time interval.In addition,there is especially the case when there existva large number of risk factors and one needs to identify the relevant or prognostic predictors among them.In this paper,we investigate the regression analysis and variable selection of interval-censored data based on a broad generalized odds rate model,which includes some common semi-parametric models such as the proportional hazards model and the proportional odds model.This paper focuses on three main areas of research: variable selection for generalized odds rate mixture cure models with interval-censored failure time data,variable selection for high-dimensional partly linear additive generalized odds rate model with interval-censored data and generalized odds rate frailty models for current status data with informative censoring.First,a general approach for variable selection that has recently attracted a large amount of attention is the penalized approach that involves the use of a penalty function.In failure time studies,sometimes there may exist a so-called cured subgroup,meaning that a portion of study subjects are not susceptible to the failure event of interest.It is easy to see that standard survival methods or models are not suitable for such situations because they assume that all subjects will eventually experience the event of interest.To deal with the failure time data with a cured subgroup,we propose a generalized odds rate mixture cure model approach with interval-censored data for variable selection in Chapter 2.In the estimation process,the Sieve approach based on monotone splines is used to estimate the unknown functions in the model.In the implementation of the method,a penalized EM algorithm based on Gamma-Poisson latent variables is given to obtain penalized maximum likelihood estimation of the parameters.Furthermore,the Oracle property of the regression coefficient estimator is proved theoretically,and the rationality of the proposed model and variable selection method is verified by extensive numerical simulations.Finally,the proposed method is applied to actual data from the Nigerian Demographic and Health Survey and some meaningful results are obtained.Next,many authors have studied variable selection of the high-dimensional intervalcensored data.For example,Wu and Cook(2015),Scolas et al.(2016),Zhao et al.(2020)and Li et al.(2020),etc.They all assume that the covariates are linearly related to the time to failure of interest.However,the effects of the covariates may be non-linear and partly linear models have recently garnered significant attention due to the fact that it combines the flexibility of non-parametric models with the simplicity and easy interpretability of parametric models.Chapter 3 investigates the problem of variable selection for high-dimensional partly linear additive generalized odds rate model with interval-censored data.In particular,some common penalty functions are considered,including LASSO,SCAD,SICA,SELO,MCP and BAR penalty functions.Bernstein polynomial based Sieve approach for estimating the maximum likelihood is proposed to overcome the computational difficulties of the non-parametric part of the model.Next,a fast cyclic coordinate descent algorithm is proposed to alternatively estimate the parameters of interest.In addition,the asymptotic property of the estimator is discussed and the effectiveness and accuracy of the estimator are verified by extensive numerical simulations.Finally,the proposed model and method are applied to fit the actual Alzheimer’s disease data,and some reasonable results are obtained.Finally,many authors have studied the regression analysis of current status data(Huang,1996;Rossini and Tsiatis,1996;Lin et al.,1998).And they have all assumed that the failure times of interest and the observation times are independent.In practice,however,there may be some correlation between the failure time of interest and the observation time,and the resulting data are called dependent current status data.In order to characterize the dependence,two commonly used methods are the copula model approach and frailty model approach.However,one of the limitations of the copula model approach is the assumption that the correlation coefficient in the copula function is known.Therefore,Chapter 4 considers the problem of regression analysis of the dependent current status data under the generalized odds rate frailty model.To solve this problem,the idea of Sieve maximum likelihood estimation is used and an EM algorithm based on Gamma-Poisson latent variables is proposed to obtain parameter estimator.After that,the asymptotic properties of the estimator including the Oracle property are discussed and the rationality of the proposed method is verified by simulation experiments.Finally,the proposed model and the method is applied to a set of real data arising from a tumorigenicity experiment. |