Font Size: a A A

Multivariate imputation of coarsened survey data on household wealth

Posted on:2001-11-18Degree:Ph.DType:Dissertation
University:University of MichiganCandidate:Heeringa, Steven GeorgeFull Text:PDF
GTID:1468390014459234Subject:Statistics
Abstract/Summary:
Sample survey questions that attempt to measure financial variables such as household income, assets and debts are subject to high rates of missing data. To counter these high rates of missing data, survey researchers now use special questionnaire formats designed to collect interval scale or "bracketed" observations whenever a respondent is unable or unwilling to provide an exact response to a financial amount question. These special question formats significantly reduce the rate of missing data but result in a coarsened mixture of actual value responses, bracketed amounts and completely missing data. Multivariate modeling of these data is complicated by both the coarsened measurements and the fact that the distribution of each variable is a semi-continuous mixture of zeroes and continuous, non-zero amounts.A mixed normal location model is proposed for semi-continuous multivariate data. The Expectation-Maximization (EM) algorithm is developed for estimating the parameters of the mixed normal location model with coarsened data. A Bayesian Gibbs Sampler algorithm is also developed for simulating draws from the posterior predictive distribution of coarsened observations from the mixed normal location model and developing multiple imputation inferences for the parameters of this multivariate model.The Gibbs sampler algorithm for coarsened data from the mixed normal location model is used to multiply impute coarsened asset and liability values in the 1992 Health and Retirement Survey data set. The estimated distribution of household net worth based on these imputations is compared to the distributions estimated by complete case analysis and simple univariate imputation alternatives including the univariate hot deck method used to impute coarsened values in the HRS Wave 1 public use data set.The performance of the EM and Gibbs sampler algorithms is tested in a simulation study that investigates the relative bias, root mean square error, and confidence interval width and coverage properties of these methods for different degrees of coarsening of the data, ignorable and nonignorable coarsening mechanisms and departures from normality in the multivariate data model. Performance of the EM and the Gibbs sampler algorithms is compared to complete case analysis, mean imputation and univariate hot deck imputation methods.
Keywords/Search Tags:Data, Imputation, Coarsened, Survey, Mixed normal location model, Household, Gibbs sampler, Multivariate
Related items