Font Size: a A A

Handling Missing Values Of Categorical Variables In Clinical Researches

Posted on:2021-08-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L ChuFull Text:PDF
GTID:1484306563966939Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
The problem of missing data is a common issue in clinical researches.Handling the missing data issue in a proper way is critical to get the reliable results from analysis.From the perspective of data missing mechanism,data missing can be divided into three categories,including missing completely at random(MCAR),missing at random(MAR),missing not at random(MNAR).MCAR means that the missing data is not related with the other data in the dataset.MCAR has little impact on analysis result,but it rearly occurs in clinical practice.MNAR means that the probability of missing data is related to unobserved data.The current methods of handling missing data cannot solve the problem of MNAR.Under this condition,the only thing we can do about MNAR is to perform sensitivity analysis to evaluate the impact of missing data on the results.When the missing mechanism is MAR,we can apply statistical methods to deal with the problem of missing data.MAR means the missing data is related with the observed data,but not related with the unobserved data.MAR is the common type of missing in clinical research,and it is also a prerequisite for the application and research of many statistical methods to deal with missing data.There are many reasons for MAR in clinical researches.One of them is intercurrent events.Intercurrent events refer to the interventions that affect the evaluation of treatment effects,or the termination of treatment due to intolerance,and other events that may affect the estimand after the subjects are enrolled.The guidance of ICH E9R(1)elaborates on the intercurrent events and the coping stradegies,which prompt us to consider the problem of missing data from the perspective of intercurrent events.Two kinds of methods are generally accepted for dealing with missing data: based-on-likelihood methods and multiple imputation(MI).MI,which has an extra step of drawing sample values from posterior distribution as imputed data,can be considered as an extension of the likelihood motheds.Different from single imputation,MI generates multiple complete datasets and synthesizes the multiple results based on the complete datasets to form the final results.In this way,MI overcomes the problem of using a certain value instead of missing values,and advoids the issue of underestimating the variation of data.Another advantage of MI is its flexibility.The flexibility is reflected in the fact that MI provides many alternative algorithms,especially the fully conditional definition(FCS)algorithm.FCS makes it possible to specify unique imputing variables and imputing models for each variable in more flexibly approaches.The flexibility of MI also reflects in that it is convenient to convert the continuous variables into categorical variables by generating complete datasets.Therefore,we can solve the missing data of categorical variables,which are derived from continuous variables,by applying MI to the continuous varibles.This approach is called indirect MI in this research,in order to distinguish the approach that directly performing MI in categorical variables.The aim of this research is to prove the advantages of indirect MI and the conditions of applying indirect MI.The starting point of this research is to study the missing data problem caused by intercurrent events in the outcome categorical variables.In order to simulating the different scenario of longitudinal data,we consider several related factors including sample size,number of visit,response rate and missing mechanism caused by common intercurrent events.Under the different scenarios,five motheds of handling missing data problem,which are complete case(CC),Last observation crried forward(LOCF),general linear mixed model(GLMM),direct MI and indirect MI,are compared to evaluate the statistical characteristics from the perspective of bias,the coverage of confidence interval(CI),type one error and power.The results of the simulated research show that indirect MI has the smallest bias,and maintain 95% coverage of CI in all the scenarios.Furthermore,indirect MI keeps controlling type one error under 5% and has comparable power with the other methods in both small and large sample size scenarios.In order to comfirm the conclusion of simulation study,we apply indirect MI in real clinical trial cases and compare with the other motheds.Then,we find that indirect MI mothod has more stable and precise results than direct MI for the different imputing numbers,regardless of low or high missing proportion.After the study for outcome categorical variable,we apply indirect MI In a retrospective study of prognostic factors in female breast cancer patients in order to explore the applicable conditions of indirect MI.In this part study,the applicable scenario of indirect MI expand from ‘longitudinal single missing variable in clinical trials’ to ‘Cross-sectional multiple missing variables in retrospective studies’.In the process of this part study,we propose the solutions for some issues about indirect MI,including the selection of imputing model,evaluation of the convengence,MAR hypothesis,and the approach of model selection.The result of this part study shows that indirect MI is suitable to handling the multiple variable missing problem in cross-section data.Furthermore,indirection MI can make full use of data information and provide more precise estimation.Therefore,indirect MI has good statistical characteristics in the different clinical scenarios.The major achievements of this study are as follows:(1)Proved that indirect MI method can make full use of continuous variables information to improve the precision of imputation for categorical variables from the perspectives of bias,the covarge of CI,type one error and power.(2)Proved that indirect MI is suitable for the various kinds of researches and data patterns,and it can be considered as the preferred method to apply in a wide range of clinical researches.
Keywords/Search Tags:missing data, multiple measurements, outcome categorical variables, indirect multiple imputation, explanatory categorical variables, prognosis factors
PDF Full Text Request
Related items