Font Size: a A A

Study On Outlier Detection Method In Survival Analysis

Posted on:2018-12-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ShuFull Text:PDF
GTID:1314330515983437Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:This study intends to construct parametric survival model with outlier and using Bayesian method to get parameters estimation and statistical inference.The proposal models may contribute to further thoroughly and systematically complement and implement of outlier detection methods in survival analysis,fully excavate and utilize the survival data and provide a methodological reference for fully understanding of development of diseases.Method:1.We construct the parametric survival outlier models by introducing an n-dimensional shift vector as an outlier indicator to the traditional exponential regression model and Weibull regression model.2.The Bayesian method is used for parameters estimation and MCMC method is used for statistical inference.The prior for y is conditional Laplace distribution and the point estimation of y is posterior median.According to confidence interval criterion,the components of y whose 50%confidence interval contained 0 are shrank to 0.Then the nonzero components of y are suppose to be outliers.3.The theoretical evaluation of proposal models is done by simulation study.The accuracy rate,masking effect and swamping effect are evaluated by R,M and S respectively.And the estimation effect is evaluated by mean,standard deviation and mean square errors.Besides,the overall evaluation of proposal models is done by comparing the proposal models with traditional parametric survival models.4.The proposal models and traditional parametric survival models are applied to hepatocellular carcinoma data and breast carcinoma data to evaluate the practive overall effect of proposal models.The fitting effect is estimated by residual plot and DIC criterion.The convergence of MCMC method is decided by MC error,sequential trace plot and GR trend plot.By using the proposal models,we sought to explore the potential outliers in real data as well as influent factors.Results:1.This study constructed exponential regression outlier model which assume that the survival time are expoential distribution with parameter ? and it can be expressed as ?= exp(X'?+?.The form of likelihood function is ?i=1n[exp(Xi'?+?i)exp(-?exp(Xi'?+?i))]?i×[exp(-exp(Xi'?+?i)Yi)]1-?i.The prior of ? is indepented flat normal distribution and the prior of y is conditional Laplace distribution with the hyper-parameters whose priors are inverse gamma distribution and gamma distribution respectively.Then the posterior distribution could expressed as P(?,?|y,x,?)(?)L(?,?|y,x,?)×?(?)×?(?).2.This study constructed Weibull regression outlier model which assume that the survival time are Weibull distribution with two parameters ? and ?.The model can be expressed as ?= exp(X'?+?)and the likelihood function is ?i=1n[?exp(Ci'?+?i)yi?-1exp(-?ex?p(Xi'?+?i)yi?-1)]?i×[exp(-exp(Xi'?+?i)yi?)]1-?i.The prior of co is flat gamma distribution and the prior of ? is indepented flat normal distribution.The prior of y is conditional Laplace distribution with the hyper-parameters whose priors are inverse gamma distribution and gamma distribution respectively.Then the posterior distribution could be expressed as P(?,?|y,x,?)(?)L(?,?|y,x,?)×?(?)×?(?)×?(?).3.The results of simulation study show that R of proposal models are over 96%and M and S fluctuate between 2%and 4%.The proposal models have high accuracy and less masking effect and swamping effect.The results of different situation indicate that the proposal models are not sensitive to censor rate of data and the ratio of outlier would slightly influence the accuracy of proposal models.The estimations of coefficient of outlier models are close to the true value and the standard deviation and mean square error are very small.And the results show little change after deleting the identified outliers.Therefore,the proposal models are regarded as robust.4.The study of hepatocellular carcinoma data shows that exponential regression model's fitting result is the worst and exponential regression outlier model's fitting result is the best.The results of outlier model show little change after deleting the identified outliers.According to outlier model,there are 10.88%outliers in the data.The protective factors are taking the experimental drug as adjuvant therapy(-1.13,95%CI:-1.371?-0.886),female(-1.17,95%CI:-1.617?-0.738)and integrated tumor capsule(-0.70,95%CI:-1.040?-0.381).And the risk factors are age at surgery(0.04,95%CI:0.033?0.054),AFP level before the surgery(0.10,95%CI:0.026?0.178),tumor number(0.86,95%CI:0.438?1.292),tumor size(0.17,95%Cl:0.010?0.332)and pathological grade(0.36,95%CI:0.176-0.550).5.The study of breast carcinoma data shows that Weibull regression model's fitting result is the worst and Weibull regression outlier model's fitting result is the best.The results of outlier model show little change after deleting the identified outliers.According to outlier model,there are 19.01%outliers in the data and the shape parameter is 1.32(95%CI:1.213?1.430).The protective factors are age at diagnosed(-0.28,95%CI:-0.431?-0.122),degree of tumor differentiation(-0.77,95%CI:-0.935?-0.607),progesterone receptor positive(-1.48,95%CI:-1.741?-1.232)and estrogen receptor positive(-0.42,95%CI:-0.668?-0.169),and the risk factor is number of lymph node metastasis(0.59,95%CI:0.447?0.735).Conclusion:The outliers in survival data may contain the new information related to the prognosis of disease which has not been discovered yet.By the proposal outlier models,we could achieve outlier detection and influence factors analysis at the same time.
Keywords/Search Tags:Survival analysis, Outlier, Bayesian method, Exponential distribution, Weibull distribution
PDF Full Text Request
Related items