Font Size: a A A

Semiparametric Analysis Of Interval-Censored Failure Time Data

Posted on:2021-05-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Y DuFull Text:PDF
GTID:1360330623977304Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Inteval-censored data frequently occur in many scientific research fields such as demographic studies,medical studies,sociology,tumorigenicity experiments,etc.(Sun,2006).By interval-censored data,we mean that the failure time of interest T can not be exactly observed,but only observed to occur in a time interval.Interval-censored data mainly includes case ? interval-censored data,case ? interval-censored data and case K interval-censored data.In this paper,we focus on case ? interval-censored data and case ? interval-censored data.The case ? interval-censored data are also referred to as current status data.In this case,each individual is observed only once,and the event of interest occurs before or after the observation time.Case ? interval-censored data means that each individual is observed twice in the experiment,and the event of interest occurs before the first observation,between two observations,or after the second observation.Many authors have studied regression analysis of the case ? interval censored data.For example,Jewell et al.(2003),Sun(2006),Lin et al.(1998)and Liu and Qin(2018),etc.They all assume that censoring time and failure time are conditionally indepen-dent.In practice,it is also very common that there exist some dependence between the failure time and the censoring time in current status data,which is often referred to as dependent or informative censoring.Zhang et al.(2005),Zhao et al.(2015),and Ma et al.(2015)considered semiparametric analysis of the case ? interval-censored data with dependent censoring under the additive risk model and the proportional risk model,respectively.The probit model has recently attracted some attention for regression analysis of failure time data due to the popularity of the normal distribu-tion and the partially linear form.Lin and Wang(2010),Liu and Qin(2018)have considered semiparametric probit model based on the assumption that the failure time and censoring time are independent.In the following,we will consider semiparametric analysis of informative current status data under the semiparametric probit model and its generalization.Consider a failure time study that involves n independent subjects and in which each subject is observed only once.For subject i,let Ti denote the failure time of interest,Ci the observation time,and Zi a p-dimensional vector of covariates,i=1,…,n.Suppose that Ti and Ci may be related and there exists another observation or censoring time ?i such as the administrative stop time that is independent of both Ti and Ci.Define Ci=min{Ci,?i},?i=I(Ci??i)and ?i=I(Ti?Ci).Then we have informative case I interval-censored data that have the form {Xi=(?i,?i,Ci,Zi),i=1,…,n}.The probit model assumes that given Zi,Ti satisfies(?)(1)The error term ?i follows the standard normal distribution independent of Zi.For the observation time we will assume that it marginally follows the proportional hazards model(?)(2)Although the probit model(1)is quite useful,sometimes one may prefer to consider a more general and flexible model.For this,in the following,in addition to the standard normal distribution for ?,we will also consider the situation where ? follows the Weibull-type distribution given by(?)(3)where d,e and f are some constants with d?0,e?0 and f>0.In this case,we will refer model(1)as the generalized probit model.Now we discuss the regression parameters estimation about models(1)and(2).For this,we will present a sieve maximum likelihood estimation procedure.In the estimation procedure,we use the copula model to describe the correlation between the failure time and the censoring time,and assume that the copula function C?,the association parameter a and the distribution of the error term ? are known.More comments on this will be given in Chapter 2.The observed likelihood function has the form(?)For the maximization of the log-likelihood function,it is difficult to directly maximize the log-likelihood function since this function involves unknown functions a(t)and Ac(c).To deal with this,we will first employ the ?-spline functions to approximate a(t)and Ac(c).Define ?n=(?n,?n,an,?n)to be the maximum likelihood estimator of ?=(?,?,a,?c).Under some regularity conditions,?n is consistent and the distributions of ?n and ?n can be approximated by the normal distributions.Secondly,we will discuss semiparametric analysis of case ? interval-censored fail-ure time data in the presence of dependent censoring in case-cohort studies under the proportional hazards model.The case-cohort design is widely used as a means of re-ducing the cost in large cohort studies,especially when the disease rate is low and covariate measurements may be expensive.Although many authors have discussed the analysis of case-cohort studies,most of the existing methods are for right-censored failure time data and cannot deal with the data with interval censoring.Furthermore,sometime the failure time of interest and the censoring mechanism are correlated.If we ignore the correlation between the failure time of interest and the censoring mech-anism,regression analysis may result in biased or misleading results or conclusions.In the following,we will consider how to deal with the situation where the failure time of interest and the censoring mechanism are correlated in case-cohort studies under the proportional hazards model.Consider a failure time study that consists of n independent subjects.For subject i,let Ti denote the failure time and Zi denotes a p-dimensional vector of covariates,i=1,…,n.Suppose that there exist two examination times denoted by Ui and Vi with Ui?Vi and one only observes ?1i=I(Ti?Ui)and ?2i=I(Ui<Ti?Vi),indicating if the failure time Ti is left-censored and interval-censored,respectively.We only have interval-censored data on the Ti's.For the case-cohort studies,the information on covariates is available only for the subjects who either have experienced the failure event of interest or are from the subcohort that is a random sample of the entire cohort.Define ?i=1 if the covariate Zi is available or observed and 0 otherwise,i=1,…,n.For the selection of the subcohort,by following Zhou et al.(2017)and others,we will consider the independent Bernoulli sampling with the selection probability q ?(01).The observed data have the form#12To describe the covariate effects and dependent interval censoring,define Wi=Vi-Ui,i=1,…,n.By following Ma et al.(2016),we will focus on the situation where the dependent censoring can be characterized by the correlation between the Ti's and Wi's.For the covariate effects,we assume that there exists a latent variable bi with mean one and known distribution but unknown variance ? and given Zi and bi,the hazard functions of Ti and Wi have the forms(?)(4)and(?)(5)respectively.It will be assumed that given Zi and bi,Ti and Wi are independent.Define ?i=(?1i,?2i)and ?=(?t,?w,?t,?w,?),where ?t(t)=(?)?t(u)du and?w(t)=(?)?w(u)du.For the estimation of ?,the following inverse probability weighted log-likelihood function where f(bi;?)denotes the the density function of the bi's and In the above,L?i|Wi,Ui,bi(?)and LWi|bi(?)will be given in Chapter 3.If f is the gamma distribution,the function lO?(?)has a closed form.In the following,we will discuss the maximization of the inverse probability weighted log-likelihood function lO?(?).It is difficult to directly maximize the function lO?(?)since this function involves unknown functions At(t)and Aw(t).To deal with this and by following Ma et al.(2015),Zhou et al.(2017)and others,we propose to approximate the two functions by Bernstein polynomials.And in the numerical studies in Chapter 3,the Matlab function fmincon is used to get the proposed estimator ?n.In the following,we will discuss the asymptotic properties of the estimators.Let(?) denote the estimator of 0,and (?)denote the true value of ?.Theorem 1.Suppose that the regularity conditions(A1)-(A4)given in Chapter 3 hold.Then as n??,we have that d(?n,?n)?0 almost surely and d(?n,?)0=Op(n-min{(1-v)/2,ur/2}),where v ?(0,1)is defined in m=o(nv)and r in the regularity condition(A3).Theorem 2.Suppose that the regularity conditions(A1)-(A5)given in Chapter 3 hold.Then as n?? and if v>1/2r,we have that in distribution,where with v(?)2=vv' for a vector v and I(v)and l*(v,O),given in the Appendix,denoting the information matrix and efficient score for v=(?t,?w,?)based on the full cohort data.For the covariance matrix of vn=(?tn,?wn,?n),it would be difficult to derive a consistent estimator and thus we propose to employ the weighted bootstraps procedure discussed in Ma and Kosorok(2005).By following Ma and Kosorok(2005),it can be shown that this weighted bootstrap variance estimator is consistent.Finally,we will discuss semiparametric analysis of case-cohort studies with case II interval-censored data under the additive hazards model.In the case-cohort study,many authors have discussed their regression analysis under the additive hazards model,but all of the existing methods assume or are applicable only to right-censored data.In the following,we will consider additive hazards regression for case-cohort studies with interval-censored data.Consider a cohort study that consists of n independent subjects and for subject i,let Ti denote the associated failure time of interest and Zi a p-dimensional vector of covariates that may be related to Ti.For the relationship between Ti and Zi,we will assume that given Zi,the hazard function of Ti has the formThat is,Ti follows the additive hazards model(Lin et al.,1998).For subject i,there exist two examination times denoted by Ui and Vi with Ui<Vi.Define the indicator functions ?li=I(Ti?Ui),?2i=I(Ui<Ti?Vi)and?3i=1-?1i-?2i.For case-cohort studies,the covariate information is available only for the subjects from the subcohort or who have experienced the failure event of interest.Define ?i=1 if the covariate Zi is known or observed and 0 otherwise,i= 1,…,n.Then under the case-cohort design,the observed data have the formFor the selection of the subcohort,by following Zhou et al.(2017),we will focus on the independent Bernoulli sampling with the selection probability q ?(0,1).Then the probability that the covariate Zi can be observed is given by i= 1,…,n.Also we will assume that given Zi,Ti is independent of the examination process or times Ui and Vi.That is,we have the independent censoring mechanism(Sun,2006).In Chapter 4,we will propose an estimating equation-based procedure and a pseu-do likelihood-based procedure for the regression parameter ?.By following Zhou et al.(2017)and Wang et al.(2010),we consider the inverse probability weighted estimat-ing function UIPW(?).And define the inverse probability weight estimator ?IPW of? as the solution to UIPW(?)=0.The following theorem establishes the asymptotic properties of ?IPW.Theorem 3 Suppose that the regularity conditions(A1)-(A4)given in Chapter 4 hold.Then ?IPW is consistent and as n??,we have that in distribution,where ?w=B1+B2 and with and for k=0,1,2.For inference about regression parameters,one needs to estimate the covariance matrix of vn=(?tn,?wn,?n).By following Ma and Kosorok(2005),we suggest to employ the nonparametric weighted bootstrap procedure to estimate the covariance matrix of ?IPW.For the estimating equation-based procedure,note that the approach does not involve the estimation of the baseline hazard function ?(t)and thus it can be relatively stable or robust.On the other hand,it may lose some efficiency.Corre-sponding to this,we will present a pseudo likelihood-based approach.In the pseudo likelihood-based estimation procedure,we have to deal with the estimation of ? and the baseline hazard function ?(t)together,which may be difficult.For this,by following Ma et al.(2015)and others,we will approximate ?(t)first by Bernstein polynomials,and we will give more details in Chapter 4.We will define the sieve pseudo maximum likelihood estimator ?n=(?n,?n)of ?=(?n,?n)to be the value of ? that maximizes the pseudo log likelihood function.For the determination of ?n,we use the interior-point algorithm in Matlab,given in fmincon.For the co-variance matrix of ?n,we suggest to employ the weighted bootstrap procedure of Ma and Kosorok(2005).The following theorems establish the asymptotic properties of the proposed estimator ?n.Theorem 4 Suppose that the regularity conditions(A1),(A3)-(A6)given in Chapter 4 hold.Then as n??,we have that d(?n,?0)?? 0 almost surely and d(?n,?0=Op(n-min{(1-v)/2,vr/2}),where v ?(0,1)such that m=o(nv)and r is defined in the regularity condition(A5).Theorem 5 Suppose that the regularity conditions(A1),(A3)-(A6)given in Chapter 4 hold with r>2 in the regularity condition(A5).Then if v>1/(2r)and as n??,we have that#12 in distribution with#12 where v(?)2=vvT for a vector v,and I(?)and l*(?0,?0;O)denote the information matrix and efficient score for ?,respectively,based on a single observation.
Keywords/Search Tags:Dependent censoring, Case ? interval-censored data, Case ? interval-censored data, Case-cohort study, Probit model, Additive hazards model, Copula function, Frailty model
PDF Full Text Request
Related items