| The problem of missing data in the data is widespread in the sampling survey. Missingdata affect the subsequent statistical analysis. Today, data collection technology and approachis becoming more and more widely, the reason of missing data diversification, and missingdata become an inevitable part of the sample. But missing data increase the difficulty ofstatistical analysis, missing data can make effective data reduction, available informationreduce, influence accuracy of statistics. Since there is no complete data information, may leadto statistical inference on deviation or invalid, ultimately affect the statistical decision. Aspeople research problem gradually deepening, the traditional delete or ignore method cannotmeet the needs of the reality. Research the problem of missing data research has importantapplication value.The text first introduces the paper selected topic background and research significance,as well as makes a simple research on the missing data document description; Chapter2introduces the cause of missing data, the loss mechanism and model, then introducescommonly used four interpolation methods and its theoretical basis in detail, The four kinds ofimputation method including mean imputation, regression imputation, the EM algorithmimputation and multiple imputation. Chapter3, Use four kinds of imputation methods,respectively, a single variable missing and multivariable missing under the condition ofdifferent loss rate and different sampling ratio for comparative analysis. Give four imputationmethods of interpolation value deviation and mse and boxplot; Chapter4is based on themodel and multiple imputation effect empirical analysis. Satisfaction survey in new ruralconstruction as the background material, combining the logistic regression models andmultiple interpolation method to do the empirical analysis, under the condition of differentloss rate; Chapter5is the summary of this article, and prospected the missing data processingmethod of further research work.The final result shows that: with the increase of loss rate, proportion of the availablesample data is reduced, four imputation methods of imputation value deviation increasegradually. The method of EM imputation and multiple imputation is relatively stable undervarious fault rate. Multiple imputation has great advantage in high fault rate, the method ofcombining logistic regression models with multiple imputation is good. |