Font Size: a A A

Statistical Inference For Semiparametric Models With A Class Of Survival Data

Posted on:2016-08-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:1220330467995505Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Time-to-event data, also called as the survival data, arise in a number of fields and may involve different types of data structures. Often one may be interested in comparing the survival functions between two groups and assessing covariate effects on survival time when other explanatory variables are available.Firstly, we proposed a class of tests of proportional hazards assumption for left-truncated and right-censored (LTRC) data based on a pair of estimators of the hazard ratio constant. Let (Ti,Li,Ci) be a continuous and positive random vector associated with group i representing the survival time of interest, the left truncation time and the right censoring time with the distribution functions Fi,Gi,Hi, respectively. Since the presence of possible truncation and censoring, we can only observe Xi=min(Ti,Ci), δi=I(Ti<Ci) and Li when Li<Xi, and nothing otherwise. Thus, the data from group i consist of ni i.i.d. replicates of (Xi,δi,Li); that is Let n=n1+n2. If K is a cumulative distribution function, we denote αk=inf{u: K(u)>0} and bK=sup{u:K(u)<1}. To ensure the identifiability of Fi7Woodroofe (1985) indicated that if aGi<min(aF.,aH.) and bGi<min(bFi,bH.), then Fi,Gi,Hi are all identifiable. Without loss of generality, we postulate that aF1=αF2=aF and bF1=bF2=bF.Define Ni(t)=∑jni=1I(Xij≤t,δij=1) and Yi(t)=∑jni=1I(Lij≤t≤Xij). Fi can be estimated by the modified product-limit estimator, namely, Then the survival function Si(t)=1-Fi(t) may be estimated by Si(t)=1-Fi(t). In addition, a Nelson-Aalen type estimator of the cumulative hazard functionâ–³i(t) can be dehiled byConsider such a case where the hazard ratio between two groups is constant, namely where α0is a positive constant.Following Anderson(1983)’s idea,α0can be estimated by where b<bF such that Si(b))>0,i=1,2,W(.)is an almost surely bounded predictable weight process satisfying W(s)=0when Y1(s)Y2(s)=0.To establish the asymptotical results,we rewrite Yi(.) and W(.)by Yin(.) and Wn(.),and require the following regularity conditions.(C1)For i=1,2,there exists fuiictiOilS yi such that as nâ†'∞,and yi(.)is bounded but far away from zero on(aF,bF).(C2)For each n,Wn(t)=0if Y1n(t)Y2n(t)=0.(C3)There exists a bounded fuiiction ω(.)satisfying as nâ†'∞.Theorem1Under conditions(C1).(C3)mentioned above,we have that(1)αWn=α0+op(1).(2)n1/2(αWn-α0)â†'N(O,σw2),as nâ†'∞,whereConsider a more geileral two-sample problem,that is,H0:λ2(t)=αλ(t),for some positive constant α and all tH1:λ2(t)≠αλ(t),for any positive constant α and all t. For any two distinct weight functions W1(t) and W2(t), under H0, the corresponding estimators of α. denoted by αw. and αw2respectively, should be close enough. Under H1, the differentialâ–³w1w2=αw2-αw1should depart from zero heavily. Explicitly speaking, write Wik=f0bWi(t)dâ–³k(t),i=1,2, k=1,2. Then αw, andâ–³w1w2can be rewritten as DenoteIn addition, we need further assumptions as follows,(C2’) For i=1,2, Wi(t)=0if Y1(t)Y2(t)=O.(C3’) For i=1,2, there exists deterministic functions wi such that wi(.) is bounded and as nâ†'∞.Theorem2Under conditions (C1),(C2’) and (C3’), we have that as nâ†'∞. where wik=f0b wi(t)dâ–³k(t), i=1,2, k=1,2. In addition, Dw1w2can be consistently estimat-ed byIn some situations, the estimator Dw1w2may be negative. As an alternative, we provide another class of estimators. This is summarized in the following corollary.Corollary1For given θ1,θ2>0.θ1+θ2=1. Letâ–³0=θ1A1+θ2A2, Wi0∫0Widâ–³0,and Ck=W2k/W20,k=1.2.Then we have nDw1w2=Dw1w2+Op(1), Using these results, we could construct a two-sided test:that is, for a nominal level c, H0would be rejected if zc/2, zc/2is the upper c/2percentile of the standard normal distribution.Secondly, we investigated the empirical likelihood inference approach under a gen-eral class of semiparametric hazards regression models with survival data subject to right censoring. This model is proposed by Chen and Jewell (2001), which is defined as where λ0(t) is the unspecified and unknown baselion hazard function, Z is the p-dimensional vector of covariates, both β1and β2are p-dimensional vectors of regres-sion parameters. Let T and C represent the survival time to the event of interest such as death and the censoring time of one subject under study, respectively. In the right-censoring mechanism, we can only observe (X, a, Z), where X=rain(T, C),=I(T≤C). Suppose that the observed data are n independent copies of (X, δ, Z), say,{(Xi,δi, Zi): i=1,...,n}.Define Ni(t)=I(Xi≤t, δi=1) and Yi(t)=I(Xi≥t), for i=1,...,n. Further-more, define r whereβ0=(β’01,β’02)’ are true values of β=(β1,β’2)’ and/301,β02∈B,B is a bounded subset of Rp. Mi(t;β0, A0)’s are zero-mean martingales with respect to the σ-filtration Ft=σ(Ni(texp(-β’01Zi)), Yi(texp(-β’01Zi)), Zi, i=1,...,n).Chen and Jewell (2001) proposed to estimate β0through the following estimating equations: where for any fixed β=(β’1,β’2), A0(t;β) is a solution to the equation and G(t, Z;β) is a known bounded function vector of t, Z, β, not lying in the span of the functions1and Z with dimensionality p. In fact, A0(t;β), has an explicit formula, 1.e.Write where K=0,1,2and α(?)0=1,α(?)1=α,α(?)2=αα’ for a column-vector α.Furthermore, denote Z(t;β)=SZ91)(t;β)/SZ(0)(t;β),G(t;β)=SG(1)(t;β)/SG(0)(t;β),Zi*(t)=(Zi’,G(t,Zi;β’)’ and z*(t;β)=(Z(t;β)’,G(t;β)’)’.Equation(4)can als0be written a。 β=(β1’,β2’)’,as a zer0一crossing0f the term(Tsiatis(1990),Chen and Jewell(2001))at the left hand side of(6),is strongly consistent of β0and asymptotically normal with a limiting variance-covariance matrix,denoted by D.Fbr fixxed c∈(0,1),an asymptotic1-c confidence region fbr β0based0n the normal approximation is given by where D is a consistent estimator of D,and x2p2(c)is the upper c-quantile0f the chi-squared distribution with degrees0f flreedom2p. H0wever,D involves the unknown baseline hazard function λ0(t)and its first deriVatiVe λ0(1)(t). Therefore,it becomes impractical to estimate D by the plugging-into method directly. Compared to it,the empirical likelih00d rati0approach is simple. Define for i=1,…,n.At the value β based on(9),the estimated empirical likelihood function is defined bv Thus we get the associated empirical liklihhood ratio at the Value β which has the formTo establish our main results,we need some assumptions(Tsiatis(1990),Ying(1993), and Chen and Jewell(2001)),including mainly(i)Zi are uniformly bounded,i.e.‖Zi‖≤M for Certain postive constant M,where‖·‖is the Euclidean norm.(â…±)Ci has a bounded denaity and λ0(.)has a bounded second derivative.(â…²)G(t,Z;β)is bounded and measurable with respect to the preceding filtration (?)t. Define and assume that both matrices are positive definite.Theorem3Let β0be the true value of β.Under conditions (â…°)-(â…²) mentioned above,we have that as nâ†'∞,(1)(2)(3) For∑(β0),it can be consistently estimated by T(β0)can be consistently by Theorem4Let β0be the true value of β. Under conditions (i)-(iii) mentioned above, we have as nâ†'∞, where rk’s are the eigenvalues of∑-1(β0)T(β0) and χk,12,k=1,…,2p,are2p independent standard chi-squared random variables with1degree of freedom.For fixed c∈(0,1), a natural asymptotic1-c confidence region for β0based on the empirical likelihood ratio is given by where χ2P,c is the upper c-quantile of the weighted chi-squared distribution with weights rk’s as in Theorem4, which are eigenvalues of∑-1(β)Γ(β). Motivated by Rao and Scott(1981), define p(β0)=2p/tr(∑-β0)F(β0)), where tr(.) means the trace of a matrix. It enables us to adopt the adjusted empirical likelihood (AEL) ratio which can be approximated by a standard chi-square distribution with2p degrees of freedom without computation of eigenvalues. The corresponding confidence region can be constructed by where lad(β) is attained by replacing the quantities involved in p(β) by their consistent estimators respectively, and X2p2(c)is the upper α-quantile of the standard chi-squared distribution with2p degrees of freedom.Finally, we consider the semiparametric accelerated failure time partial linear mod-el (AFT-PLM), which can be written as where T denotes the failure time, β is a p-vector of regression coefficients to be estimat-ed, Z is a p-vector of covariates, U is an univariate covariate, h(.) is an unknown smooth function, ε is the random error with mean zero but unknown distribution. Suppose that there are n independent subjects under study. For the ith. subject, i=1,...,n, let Ti denote the failure time. Since the failure time is subject to right censoring, we will instead observe the i.i.d. vectors(Yi, δi, Zi, Ui) of (Y,δ, Z, U), where Y=min(T, C), δ=I(T≤C). we also assume T and C are independent given (Z, U) and,(Z, U) and∈are independent. For technical reasons, we assume the covariates Z and U have bounded supports, and without loss of generality, we take the support of U as the interval [0,1]. In addition, we assume that the smooth function h(u) can be expressed as a function of B-splines, i.e., where Bl(u),l=-(?),...,L, are the B-spline basis functions of degree e≥1associated a sequence of knots Let B(u)={B_e(u),..., BL(u)}’ and γ=(γ-e,...,γL)’. Then one can write the expres-sion (15) as By applying the weighted log-rank estimation method (Tsiatis(1990), Ying(1993)) a-long with the Gehan weight function, we have the following estimating function where Xi=(Z’i, B(Ui)’)’,ei=ei(θ)=logYi-θ’Xi, which is often referred to as the Gehan estimating function. On the other hand, the Gehan estimating function Ψg(θ) in (8) is the gradient of the following convex loss function which is called the Gehan loss function (Chung et al.(2013)). Naturally one can define the Gehan estimator of θ as the minimizer of the objective function fG(θ), denoted by θ=(β’,γ’)’. To develop an easily-implemented estimation method for both the regression coefficients β and the possibly nonlinear function h(.).Define the following smoothing approximation to the Gehan loss function in (19), where Kε is a sufficiently smooth real-valued function, having the form with sufficiently small but strictly positive ε. We define the estimator of θ as the minimizer of the smooth objective function fGε(θ) in (8), i.e., Then h(.) can be estimated by h(u)=∑lL=erlBl(u). Write Under the conditions aforementioned and assumptions A1-A3in Johnson and Strawder-man (2009), it can be shown that9is consistent, and n1/2(θ-θ0) converges in distribu-tion to a normal vector with mean zero and covariance matrix AG-1(θ0)DG(θ0)AG-1(θ0), where θ0is the true value of θ, which can be consistently estimated by where k1,ε(v)=(?)κ(v).In many studies, there exists a monotonic relationship between one covariate and the response variable. For example, the growth curves and dose-response curves are well-known to be monotone. When the available data are complete, many researchers have investigated the model (1.1.5) with monotonicity constraints. To the best of our knowledge, however, there is little study for the model (1.1.5) with g(·) being monotone and the response variable subject to right censoring. Therefore, it would be preferable to develop a practical method for this case. Let T and C denote the logarithm of the failure time and the censoring time, respectively. Let (Ti,Ci,Xi,Zi), i=1,..., n, be a random sample of (T, C, X, Z) satisfying the model, explicitly, Since the failure time is subject to right censoring, we will only observe the vector (Y,δ,X,Z) with Y=min(T,C) and5=I(T≤C), where I(·) denotes the indicator function. Thus the available data are of the form (Yi,δi,Xi,Zi), i=1,...,n. As in most cases, we assume (X, Z) and∈are independent, and∈is zero-mean with finite variance.Let Fn be the Kaplan-Meier estimator of the distribution function F of T. Fol-lowing Stute and Wang (1993), Fn can be written as where the wni’s are the Kaplan-Meier weights and can be expressed as Here Y(1)≤…≤Y(n) are the order statistics of Yi’s and δ(1),...,δ(n) and (X(1),Z(1)),(X(n), Z(n)) are the associated version of the ordered Yi’s. Via the spline function aforementioned, we can represent g(·) by the basis expansion To assure the derived approximation s(z) is monotone, one only need to require non-decreasing constraints to be imposed on the coefficients a. By replacing g(z) by s(z) into the previous loss function Ln(β), we immediately obtain the spline loss function subject to α1≤…≤αkn, where θ=(β, α), we can define the estimator θ=(β,α) of θ as the minimum of (4.1.10), i.e., Then the spline estimate of the unknown monotone function g(·) is g(z)=a’B(z). Under the conditions listed in subsection4.1.2,we present some asymptotic results for the spline estimator(β,α).Theorem5Let kn=O(nv),for1/(2r+2)<v<1/(2r).Assume that(C1)-(C7) hold.Then d2((β,g),(β0,g0))â†'p0,as,nâ†'∞Theorem6Let kn=O(nv),for1/(2r+2)<v<1/(2r).Assume that(C1)-(C7) hold.If∑and∑0defined in the Appendix both are positive definite,we have as nâ†'∞.
Keywords/Search Tags:proportional hazards assumption, empirical likelihood, survival data, acceleratedfailure time partial linear model, polynomial-based smoothing
PDF Full Text Request
Related items