Font Size: a A A

Comparacion de los metodos de imputacion con respecto al poder de separacion del modelo de regresion logistica (Spanish text)

Posted on:2007-08-23Degree:M.SType:Thesis
University:University of Puerto Rico, Mayaguez (Puerto Rico)Candidate:Lopez Vazquez, VictorFull Text:PDF
GTID:2440390005960254Subject:Statistics
Abstract/Summary:
An MCAR (Missing Completely at Random) mechanism was used with different missing data proportions in order to generate iteratively missing values in some data sets obtained from the Machine Learning Database Repository at the University of California, Irvine, to compare the efficiency of single, hot deck, and multiple imputation techniques in a logistic regression model. The parameter of interest in these comparisons is the separation power of the logistic regression model obtained by the area under the Receiver Operating Characteristic (ROC) curve. We are implementing unconditional and conditional mean, median, and mode (IMEAN, ICMEAN, IMED, ICMED, IMOD, ICMOD) as the single imputation methods. And for the Hot-Deck imputation, we used the unconditional and conditional random sampling of the observed values (IRS, ICRS), and the k th nearest neighbor imputation (KNN). The multiple one is the FRITZ (Federal Reserve Imputation Technique Zeta) algorithm implemented by [Kennickell, 1991] on the SCF (Survey of Consumer Finances). Several iterations for the separation power were obtained after a generation of missing data with a given proportions, and then fill-in these missing values by some imputation method. The average bias between the real separation power and the separation power for all the iterations was calculated for all the imputation methods and some missing data proportions. The testing of these estimated biases were made by using non-parametric comparison procedures. From these testing we have found that the ICRS technique generate the minor bias on the area under the ROC curve. Also, we found that under a MCAR mechanism there are imputation methods that have a good performance at proportions of missing data higher than 15%.
Keywords/Search Tags:Missing data, Imputation, Proportions, Separation power
Related items