Font Size: a A A

Comparison And Empirical Analysis Of Imputation Methods For Missing Data

Posted on:2018-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:T B ShenFull Text:PDF
GTID:2430330515953941Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In the process of data collection,statistical data loss,human error,and uncontrollable factors often result in data loss.For example in the oil field drilling risk assessment requires a lot of field data,the lack of data problem in risk assessment is a very important problem.It not only affects the process of statistical analysis,and easy to cause disturbance to the results of the investigation or research,the results appear deviation,even draw wrong conclusions.A large number of studies have been conducted at home and abroad on how to deal with missing data,which is still a hot topic in statistical research.Common data analysis is based on complete data analysis,and for data set with missing value,obviously can not directly use,need to fill the missing value before they can on the basis of the analysis.This article first introduces the missing data with lack of mechanism and mode,summarizes the common missing data processing methods,including delete,imputation and not deal with three ways.Describes the missing value imputation of five kinds of commonly used imputation method,the mathematical principle of including mean value imputation,the median imputation,regression imputation n method,the EM imputation and multiple imputation.Through simulation of three groups of single variable five different loss rate of missing data sets,according to the data using four methods to fill the missing model,comparison of the multiple number of imputation and the influence of different imputation methods for efficiency.This paper is data imputation analysis in some oil field drilling field,empirically the structure variable random loss of 5%-40%of the data set,based on principal component analysis of multivariate regression imputation filling and other imputation method to fill in the missing value.The imputation effect was compared from the mean error value,to mean square error value,the regression coefficient and the deviation Angle of the regression coefficient.Imputation results showed that when low loss rates means value interpolation multiple imputation method and multiple linear regression method imputation error of the mean and the mean square error smaller,when the loss rate is bigger,and multiple regression imputation has better interpolation effect.According to the existing imputation method,two improved imputation methods are proposed:the RED imputation method and the DA-REG imputation method,and the new method is applied to the actual imputation.Comparison the loss rate of interpolation value and true value fitting effect through a variety,the results showed that the error of mean imputation and the median imputation results is smaller,with the loss rate increases,mean square error is bigger,and the imputation value of a single flawed;In the low missing rate,the imputation results of several other methods match the true values,which are best when the low missing rate is returned to the imputation method and the DA-REG imputation.With the increase of missing rate,the fitting effect is getting worse.In combination,the multiple imputation and the DA-REG imputation method are better.
Keywords/Search Tags:missing data, multiple imputation, regression imputation, RED imputation, DA-REG imputation
PDF Full Text Request
Related items