Font Size: a A A

Missing Data Filling Method And Empirical Analysis

Posted on:2011-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y DengFull Text:PDF
GTID:2190360305959606Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
The existence of missing data is a relatively difficult problem in trial studies. It not only impacts the estimation accuracy,but also affect the follow-up work of statistical staffs. In the past 20 years, the approach in this field has always been a hot topic. The overseas has taken extensive research, but the domestic units often use some routine treatment such as removing element group with missing data, or other simple filling methods. But, with the depth and complicating of the study, these conventional methods have been unable to meet the decision-making needs.Therefore, in order to solve this problem effectively, this paper introduces the principle of several common treatments dealing with missing data, which divided into three categories,that is removing the missing tuple, imputation and no treament. The paper focuses on the theory of imputation methods(mean imputation, imputation at radom, regression imputation, EM, muti-ple imputation), and discusses relevant iterative formula of estimated parameters.Meanwhile, the data from the sick with diabetes-glucose, serum total cholesterol, triglyceride, insulin and glycosylated hemoglobin are used in empirical analysis. To compare filling effect and applicable conditions of different methods. First, the paper constructs a set of data sets with different missing levels on the original complete data set. Second, the methods introduced above are used in these data sets. And effectiveness and applicability are compared on the aspects of standard err,standard deviation, deviation between estimated value and the true one, and the difference from the sample distribution in the degree of method. In addition, to illustrate the broad applicability of multiple imputation, a particular type of high rate of missing data set (Mathematical Contest in Modeling) is used to analyse above opinions. And it provids a guideline for the future decesion-making.The results indicates that expectation maximization (EM) algorithm and regression algorithm under different missing rate is relatively better than others, while the multiple imputation plays a better role in the middle and high cases,and develops in larger space, although it doesn't reach the desired effect, the results are acceptable.
Keywords/Search Tags:Means filling, Imputation at random, EM algorithm, Regression model, Multiple imputation
PDF Full Text Request
Related items