Font Size: a A A

Simulated Comparison Of Different Filling Methods In Missing Values

Posted on:2013-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:L L HuaFull Text:PDF
GTID:2234330371475841Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
ObjectiveMissing value is a common problem in traditional Chinese medicine of HIV/AIDS. It will increase the complexity of the analysis, and cause bias of the results and so on. It is urgent to resolve the missing value before statistic analysis. Compare the effect of different methods in simulated data of missing values, and conform the most apprppriate times of multiple imputations (MI). To explore the most exact, effective and convenient methods in different missing mechanism and different missing pattern.MethodsSAS9.1was used to simulate data, and to input missing value by different methods. Expectation maximization method(EM), regression method, imputating in mean method, deleting in groups method and multiple imputation method(MI) were used to dealing with continuous value with missing values, and the results were compared from accuracy, precision and mean. About binary variable data, deleting in groups method and logistic regression method in MI were employed and compared, and the results were compared by regression coefficient and standard error.Results1. Continuous variable data:The missing pattern of continuous value was arbitrary missing pattern. The more times was fulled in, the more powerful was the imputation effect. When the times of imputation were10, the effect was up to0.95, and the precision was best. When missing rate was not more than20%, the accuracy was better in imputating3or5times, while missing rate was between30%and40%, we need to imputating10times to get better accuracy. If the missing rate was above50%, the accuracy was poor. 2. Missing completely at Random:When missing rate was not more than10%. the effect of these five methods was similar. But MI had better precision and accuracy. When the missing rate was above20%, deleting in groups method and MI method were better than others. MI method had best precision, while deleting in groups method had best accuracy.3. Missing at Random:When the missing rate was between10%and20%, MI method had best accuracy and precision. When the missing rate was30%, deleting in groups method had best accuracy. If the missing rate was above40%, the effect of all methods was poor.4. Binary variable data:When missing rate was not more than40%, deleting in groups method was more similar to whole data in regression coefficient and standard error. When the missing rate was between40%and50%, logistic regression method in MI was better, and the most apprppriate times of imputation was2in this study dataset. If the missing rate was above60%, the effect of these two methods was poor.ConclusionsIt can be considered to be normal distribution for a large sample of continuous variables material, and allow missing range is below30%. Some traditional methods, such as imputating in mean method and deleting in groups method, have some advantage in treating missing values, which is more easier and convenient. Comparing with traditional methods, MI is able to solve most of problems in missing data sets, and it is more convenience and powerful than other methods.
Keywords/Search Tags:Missing values, Simulation Imputation methods, Missing Comple-tely at Random, Missing at Random
PDF Full Text Request
Related items