Font Size: a A A

The Application Of Auxiliary Variable Selection In The Imputation Methods For Item Non-response

Posted on:2014-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:J CaiFull Text:PDF
GTID:2250330425989509Subject:Statistics
Abstract/Summary:PDF Full Text Request
Non-response of the data extensively exists in the sampling survey. Imputation is one of the good solutions to deal with non-responses, in which auxiliary variables are very important to improve the accuracy of the imputed values. If auxiliary variables can be properly used, it would reduce both the biases and the variances of the estimators. But not every auxiliary variable can provide valuable information about the target variable.Putting those auxiliary variables which are meaningful to the target variable into the model may improve the accuracy of imputed values.There are many literatures both at home and abroad that mainly discuss on using auxiliary variables in the imputation model to increase the accuracy of the imputed values, but few studies have specifically investigated about how to select those valuable auxiliary variables in the imputation methods.This paper investigates a new imputation process with two steps:the first step is to select those auxiliary variables highly correlated to the target variable, and then use these auxiliary variables to construct imputation model to estimate non-responses. This paper simulated the imputation methods by selecting auxiliary variables to examine the feasibility of the new process. It mainly compared the accuracy of the imputed values of combinations with different auxiliary variable selections and different imputation models. And it also discussed about the imputation methods by selecting auxiliary variables under the influence of non-response rate, sample size, the error term and the correlation between auxiliary variables. In the simulation process of imputation, the selection of auxiliary variables was based on AIC or given numbers of auxiliary variables. The correlation coefficients, coefficients of partial correlation and stepwise variable selection were also applied to conduct the selection process.Three common imputation models which are the regression imputation model, the random regression imputation model and the imputation model with EM algorithms were chosen in the simulation tests.From the results we conclude that when including all the vital auxiliary variables that highly correlated with the target variable in the imputation model, the imputed values would have the highest accuracy. Different methods to select auxiliary variables in the first step would derive inconsistent precision of the imputed values. In general when the sample size and the error term are both small, using correlation coefficients to select auxiliary variables would get better imputation results; otherwise with big error term, using partial correlation coefficients or stepwise variable selection to select auxiliary variables would give out better imputation values.When the auxiliary variables have small correlations between each other, it would barely affect the accuracy of imputed values. However, if the correlations are big, it would cause a great impact on the accuracy of the results. This is especially apparent with small sample size and high rate of non-response. Regression imputation model would be a better choice under the condition of small sample size and low rate of non-response. Facing with big sample size and low rate of non-response, regression imputation model and imputation model with EM algorithms may both be the better choices.
Keywords/Search Tags:Item Non-response, Auxiliary Variable, Imputation Methods, Non-responseRate
PDF Full Text Request
Related items