Font Size: a A A

Logistic Regression-Close Neighbour Imputation

Posted on:2014-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2250330425489510Subject:Statistics
Abstract/Summary:PDF Full Text Request
We often face with missing data when analyzing data in real life. People may refuse or forget to answer certain questions in a survey, files are lost, or data are not recorded properly. Non-response often increases the difficulty of the statistical analysis, leads to the deviation of the analytical results, and reduces the quality of statistical outcomes. It is impossible to acquire completely accurate data collection method in the real survey. Sometimes we also can’t investigate again regarding to the limited time and cost. Experiences have shown that pre-prevention is the most effective treatment method. However, pre-prevention cannot solve the problem of missing data sufficiently, which makes imputation methods are used more and more widely in dealing with non-response. Many scholars have deeply conducted theoretical and empirical researches on imputation methods.In this paper, I simply summarized the previous studies about the imputation methods. On the basis of these methods I tried a different kind of imputation method, the Logistic Regression-Close Neighbour Imputation. This method inherited the high accuracy of Logistic regression imputation method as well as the nature of Nearest Neighbor Imputation method. I compared the Logistic Regression-Close Neighbour Imputation method to the methods of mean imputation method, Nearest neighbour imputation method, Logistic regression imputation method and regression imputation method. In the computational experiments, the non-response rate are respectively5%,10%,20%,30%,40%and50%, the number of regression variables are respectively2,3,4and5. For categorical data, simulation results show that Logistic regression-close neighbour imputation method sometimes performs better than Logistic regression imputation. K-nearest neighbour imputation performs worse than all the other methods. For continuous data, simulation results show that Logistic regression-close neighbour imputation method performs better than other three methods when the variance is large. When the variance is small, Logistic regression-close neighbour imputation method does not show obvious advantages and the mean square error has a tendency to rise with the increase of the number of variables. For the actual data, the simulation results show that mean square error has a tendency to increase with the missing rate increasing, Logistic Regression-Close Neighbour Imputation method’s mean square error and volatility is minimum. The effect of Logistic Regression-Close Neighbour Imputation method is better.I verified the superiority of the Logistic regression-close neighbour imputation method by applying it on simulation data and actual data. Considering this method has good characteristics hopefully it can provide a new way which has reference value in solving the practical problems.
Keywords/Search Tags:Non-response, Nearest neighbour imputation, Logistic regression
PDF Full Text Request
Related items