Comparing six missing data methods within the discriminant analysis context: A Monte Carlo study

Posted on:2001-10-14

Degree:Ph.D

Type:Dissertation

University:The Ohio State University

Candidate:Viragoontavan, Sunanta

Full Text:PDF

GTID:1460390014454995

Subject:Curriculum development

Abstract/Summary:

The purpose of this study was to compare the relative effectiveness of six missing data methods for discriminant analysis. These six missing data treatments were listwise deletion, group mean substitution, regression-based imputation, hot-deck imputation, multiple imputation using SOLAS(TM) (commercial computer software for missing data), and multiple imputation, NORM, developed by Schafer.;Missing data methods were compared under three simulated conditions: correlation structures (low/moderate and high), sample sizes (100, 200, and 500), and proportions of missing data (.05, .10, and .20). Values were randomly deleted from a complete data matrix at the three levels of proportion of missing data: .05, .10, and .20. These incomplete data matrices were then treated by the six missing data methods. The treated data as well as complete data were subjected to linear discriminant analysis. The relative effectiveness of the six missing data techniques was assessed by deviations of the hit rate and the discriminating power of the first discriminant function. The results revealed that the two multiple imputation procedures were uniformly the most effective. The two most effective methods were multiple imputation employing SOLAS(TM) and the multiple imputation approach developed by Schafer. In general, the third most effective method was the hot-deck procedure. The group mean and regression-based procedures performed reasonably well in estimating the discriminating powers of the principal discriminant function, but these two methods did not seem to function as effectively as previously mentioned methods in estimating the hit rates. Listwise deletion was found to be the least effective approach.;Finally, all methods provided more accurate estimates with data slightly/moderately correlated than they did with data highly correlated. The accuracy in estimating the hit rate and discriminating power increased directly with the sample size and inversely with the proportion of missing data.

Keywords/Search Tags:

Missing data, Discriminant analysis, Multiple imputation, Effective

Related items

1	Comparison And Empirical Analysis Of Imputation Methods For Missing Data
2	The impact of missing data treatments in a multiple regression analysis: A Monte Carlo comparison of deterministic imputation, stochastic imputation, multiple imputation, and the deletion procedure
3	The Application Of Multiple Imputation In Compositional Data Analysis
4	Maximum likelihood estimation and multiple imputation: A Monte Carlo comparison of modern missing data techniques for multilevel data
5	Incomplete Data Filled
6	Imputation Methods Of Missing Values For Compositional Data
7	Impute Missing Values For Mixed Data
8	Missing Data Filling Method And Empirical Analysis
9	Multiple Imputation Of Competing Risks Data With Missing Cause Of Failure In Survival Quantile Regressions
10	Comparison Of Imputation Methods Based On Value Prediction