Font Size: a A A

The Comparison Of Nine Common Imputation Methods For Missing Values

Posted on:2018-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:X C LiaoFull Text:PDF
GTID:2310330533965251Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Missing values is common and tricky problem in data,which will make the analysis much more complex and difficult,and also cause unreasonable results,even decrease the efficiency of whole statistical work.It is most simple and effective method to prevent missing value in advance,but it is hard to solve this problem perfectly due to various reasons and condition.So it is most important and paid more attention to handle the missing value after get data.Deletion and imputation with values are common methods to handle missing values,but considering the fact that will leads to some information loss by the method of deletion,so imputation with values is chosen in our paper.Firstly,the theory of common imputation methods— mean imputation,imputation at random,regression imputation,multiple imputation,k nearest neighbor imputation,imputation of Decision Trees,imputation of Support Vector Machine and imputation of Neural Network are introduced in our paper;then the data sets salary,iris and Airfoil we chosen from less to more missed at random based on 10% in sample size of complete data,the two types accordingly missing data generated based on a variable missing at random and Multiple variables missing at random in R(the meaning of missing at random in there is 10% of data replaced by missing values at random).next those nine methods are used to impute this two types missing data.In order to evaluate and compare the effect of those imputation methods,we will compare them form two aspects:(1)In the view of imputation error,the true values and Corresponding imputed values compared,then calculating their Mean absolute error(MAE)and mean square error(MSE),evaluating the the pros and cons of imputation methods from the values of MAE and MSE.(2)In the view of the mode,using the original data and imputed data from various methods build multiple linear model respectively,get the regression coefficient(vectors)from the multiple linear model,and calculating the coefficient about judging,then compare and evaluate the imputation methods.Finally,we Point out the similarities and differences of all imputation methods,summary the findings in our paper,and improvements we need to do in the future.
Keywords/Search Tags:two types of missing value, missing data at random, nine imputation methods, comparing eorr of imputaion, Comparison of the modeling
PDF Full Text Request
Related items