Font Size: a A A

Treatment Of Missing Value In Agricultural Economic Research Data: Model,Method And Application

Posted on:2018-08-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:C K PanFull Text:PDF
GTID:1319330515485825Subject:Agricultural Economics and Management
Abstract/Summary:PDF Full Text Request
China is a large agricultural country,although the proportion of agricultural output by GDP is not so high,but the employment of agricultural population accounted for 28.3% of the total employment population,agriculture is still the basis of national economic and social development.Agricultural economy and agricultural management research is still necessary and important,and most of these researches need to obtain data through agricultural economic researches.Missing value is a problem in most agricultural economic researches data.The farmers’ nonresponse,agricultural researcher’s negligence will cause missing values.Compare with other research,such as market research,political poll,the agricultural economic research has a lot of characteristics.Such as the agricultural economic survey also use an old face to face interviewing method,it can obtain more auxiliary information,randomness of agricultural economic research is not so high but the farmer is more responsible etc...According to the characteristics of agricultural economic research and the reasons of the missing value,this paper constructed a set of systematic models and methods for the missing value of agricultura economic l research data.These models and methods are divided into three parts: deletion and weighted adjustment,single imputation model and multiple imputation model.The logic of this study is: estabalish a model,then find a method,analyse its’ s advantage and find another method to make up it.The hypothesis of this study is: the data is sampled from a nomal distributed population;the data is random sampled;the missing pattern of data is general missing and single missing;the missing mechanism of data is missing completely at random(MCAR)and missing at random(MAR);the variables of data are mostly quantitative variables.In most cases,missing values are ignored by agricultural economic researchers.Missing values are often deleted as invalid data,and as default method most of the data analysis software also delete the missing value.With listwise deletion,all the case with missing value will be deleted as a complete data.When the proportion of missing data is very low,the deletion of missing values does not matter,but when the proportion of missing data is not low,it will lead to a large number of cases deleted.A simulation analysis in this paper shows when the number of variables grow,even a low proportion of missing ration will cause a great deal of data deleted.An alternative method is pairwise deletion,it use the available cases to estimate the parameters,and can delete the data as little as possible.However,pairwise deletion will make the estimation from different sizes of samples,and cause estimate troubles.But a simulation analysis by this research shows there are no more advantages of pairwise deletion by listwise deletion when estimate the correlation of data.When the data is not missing completely at random(MCAR),both the listwise deletion and the pairwise deletion will cause biased estimates.A weighted adjustment method can correct the bias of estimation which caused by listwise deletion.And simulation analysis support this view.Imputation may be a better method in handling with missing value than deletion method,because it will not delete useful information.There are two kinds of imputation methods,single imputation and multiple imputation.The former refers to impute a single value for the missing value;the latter refers for each missing value,the imputed value will be more than one.The basic idea of imputation is based on the posterior distribution of data.Some single imputation models are based on explicit posterior distributions,such as mean imputation and regression imputation,while some of them are not based on explicit posterior distributions,such as hot deck imputation.There are three kinds of mean imputation methods: simple mean imputation,random mean imputation and stratified mean imputation.The simple mean imputation is imputing missing values with means of complete data.The imputed values by simple mean imputation are completely distributed in the center of the data,which greatly underestimates the population variance.The distribution of impute value by random mean imputation is more dispersed.However,when the data is not missing completely at random(MCAR)the mean imputation will case biased estimators.Stratified mean interpolation can be used to correct the problem.Although the estimators by stratified mean imputation are unbiased,the imputed values are still too concentrated.Regression imputation is a more effected method in imputing missing values.Simple regression imputation is using regression predict to impute missing value by posterior distribution regression.Random regression imputed value is regression predict with random residuals.The simulation results show that the regression imputation is a better method than the mean imputation,especially the random regression imputation,while the simple mean imputation is the least recommended.If there is no obvious posterior distribution of the missing data,the hot deck imputation methods are better choices.The method of hot deck generates the imputed value through the complete part of the data.A simple imputation of the hot deck methods is a simple random sampling from the observed data.A better way is using auxiliary information to draw random values from stratified observed data as imputation of missing values.After single imputation of agricultural economic survey data,you can get a "complete" data,and then can use the traditional method for data analysis.However,the standard error is always underestimated when the parameters are estimated by single imputed data.The multiple imputation is the most effective method to solve the problem.Because the multiple imputation does not produce a single imputed value for the same missing value,so it can use of the difference between different imputed values to make up the underestimation of the standard error.The basic idea of multiple imputation is,generating m different imputed values for the same missing value and resulting in m "complete" data,then estimating with each "complete" data,finally pool the result.There are two kinds of multiple imputations: one is univariate normal model,the other is the multivariate normal model.Univariate normal model still use regression method to generate imputed values,but the parameters of the regression are caused at random.One method is randomly drawing parameters from posterior distribution of the parameters,this is the Bayesian method;the other is using the Bootstrap sample to generate the parameters of the model,that is Bootstrap method.The simulation results show that both Bayesian method and Bootstrap method have correct estimation results.Multivariate normal model for the imputation of missing data by agricultural research is based on general missing pattern.Joint modeling and fully conditional specification are the most widely used multivariate imputation methods.Joint modeling method assumes that the data can be described by a multivariate distribution then imputations are created as draw from the fitted distribution.Full conditional specification imputes multivariate missing data on a variable-by-variable basis.The model requires a specification of an imputation model for each incomplete variable,and creates imputations per variable in an iterative fashion.The results of the simulation show that both the joint modeling method and the fully conditional specification method have good estimating and testing result.With the processing of actual agricultural missing data,if the data is consistent with the model assumptions,regression imputation methods are better.In the case of missing data does not conform to the assumptions of the model,such as there are extreme values in data,the imputation methods based on the hot deck will get more robust imputations.And based on this study,some suggestions were given for the agricultural economic researchers who will deal with agricultural economic missing data.Suggestions before dealing with missing data are: good questionnaire design will cause less missing value;good communication with the famer will cause less missing data.Suggestions in dealing with missing data are: face the missing data question;don’t delete missing value;use categrory variable to impute missing value;take descriptive statistics before imputing missing values;use regression imputation method;use multiple imputation when data is general missed.The possible innovations of this study are:(1)Maybe it is the first systematic and exploratory research on the treatment of missing values in China’s agricultural economic research data.(2)The simulation analysis of this study is special and creative.In this paper,specific models and methods are provided for dealing with missing values in agricultural economic research data.And through theoretical analysis and simulation analysis,this paper systematically analyzes and compares the conditions and advantages of these methods.Most of these simulation methods are designed by the author.(3)R code is provided by this study in actual operation for the missing value of agricultural economic research data.
Keywords/Search Tags:Agricultural Economic Research Data, Missing Value, Missing Pattern, Missing Mechanism, Listwise Deletion Method, Single Imputation Model, Multiple Imputation Model
PDF Full Text Request
Related items