With the arrival of the era of big data,it has become normal to obtain information through data mining.However,most data we obtain is missing,that is to say,the data is incomplete.However,many statistical methods need complete data to study when they are used for data analysis.Missing data generally occurs in various research fields.The processing of missing data can no longer be limited to ignoring or deleting directly,now data imputation is becoming more and more active.This paper mainly discusses the imputation effect of various imputation methods on classification missing variables and continuous missing variables.When exploring the imputation method of classification missing variables,firstly draw Receiver Operating Characteristic(ROC)curve and then calculate the area under the curve(AUC).Generally,the higher the value of AUC,the better the performance of the classifier and the better the imputation effect.Moreover,we found that with a certain degree of missing,the ranking of the imputation effect of the classification imputation method does not change much,although the missing rate is different.In the study of continuous missing variables,we not only used the comparison index reflecting the numerical error between true value and imputation value,average absolute error(MAE),mean square error(MSE)and root mean square error(RMSE)to compare the effect of each imputation method.The comparison of model error is also used,calculating the angle and square error between the model coefficient of the complete data obtained after imputation and the real data,so as to determine the influence degree of each imputation method on the model.In addition,it is found that the effect of each imputation method will change with different missing rate.When the degree of missing data is low,the decision tree will indeed have a good imputation effect.However,with the increase of the missing rate,its imputation effect is relatively reduced,while the imputation effect of multiple imputation and random forest imputation is relatively improved. |