Font Size: a A A

Extension of the Regression Method for Imputation of Data with Monotone Missing Pattern using Multivariate Adaptive Regression Splines (MARS), with Applications to Systematic- Missing-At-Random (SMAR) Study Design

Posted on:2014-07-28Degree:Ph.DType:Thesis
University:New York UniversityCandidate:Lu, FengFull Text:PDF
GTID:2450390008462511Subject:Biostatistics
Abstract/Summary:
Systematic-Missing-At-Random (SMAR) studies are designed to deal with resource limitations. These designs use the entire study group to measure primary endpoints, important covariates and `inexpensive variables' and use nested random sub- samples to measure more `expensive' variables. These study designs generate monotone missing data. The imputation method used to restore the complete data is key to the accuracy of the statistical analysis after the data collection.;This thesis reviews some generally accepted imputation methods that can be used in monotone missing data such as data generated from SMAR study designs. These methods include the EM algorithm, the regression method, and the predictive mean nearest neighbor method (PMN), among others. We discuss the underlying principles and compare advantages and disadvantages of these methods. We propose a new regression-based imputation method: multivariate adaptive regression splines (MARS) imputation. We compare the performance of this new method in logistic regression models to three other imputation methods: the EM algorithm, the regression method and the predictive mean nearest neighbor (PMN) under four simulated scenarios: (1) highly correlated covariates; (2) nonlinearity between covariates; (3) uncorrelated covariates; and (4) non-normal covariates. We evaluate the performance of these methods in these settings using three different measures of performance. We demonstrate that the performance of MARS is superior when the covariates are highly correlated or nonlinearly related, and its performance is non-inferior to the EM algorithm in the other two situations. Both MARS and EM outperform the regression method and PMN in general. We also examine the effects of sample size, number of variables, outliers and varying percentages of missingness on the performance of these methods. MARS imputation is more robust than the other regression-based methods that we evaluate and does not underperform the EM algorithm in the presence of these interferences. An example that uses biomarker data from NYU Lung Cancer Early Detection Research Network Center (Grant Number: EDRN - U01CA86137 from the National Cancer Institute) cohort is provided to illustrate the application of MARS imputation in practice. Because of the relative simplicity of the MARS method compared to the EM algorithm, we propose MARS imputation as a better imputation method when data result from SMAR-type study designs with monotone missing data, particularly when there is high correlation or nonlinearity among imputed variables.
Keywords/Search Tags:SMAR, Data, Monotone missing, MARS, Imputation, Method, EM algorithm, Designs
Related items