Font Size: a A A

Comparing Poisson, Hurdle, and ZIP model fit under varying degrees of skew and zero-inflation

Posted on:2008-09-06Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:Miller, Jeffrey MonroeFull Text:PDF
GTID:1440390005968022Subject:Education
Abstract/Summary:
Many datasets are characterized as count data with a preponderance of zeros. Such data are often analyzed by ignoring the zero-inflation and assuming a Poisson distribution. The Hurdle model is more sophisticated in that it considers the zeros to be completely separate from the nonzeros. The zero-inflated Poisson (ZIP) model is similar to the Hurdle model; however, it permits some of the zeros to be analyzed along with the nonzeros. Both models, as well as the Poisson, have negative binomial formulations for use when the Poisson assumption of an equal mean and variance is violated.; The choice between the models should be guided by the researcher's beliefs about the source of the zeros. Beyond this substantive concern, the choice should be based on the model providing the closest fit between the observed and predicted values. Unfortunately, the literature presents anomalous findings in terms of model superiority.; Datasets with zero-inflation may vary in terms of the proportion of zeros. They may also vary in terms of the distribution for the nonzeros. Our study used a Monte Carlo design to sample 1,000 cases from positively skewed, normal, and negatively skewed distributions with proportions of zeros of .10, .25, .50, .75, and .90. The data were analyzed with each model over 2,000 simulations. The deviance statistic and Akaike's Information Criterion (AIC) value were used to compare the fit between models.; The results suggest that the literature is not entirely anomalous; however, the accuracy of the findings depends on the proportion of zeros and the distribution for the nonzeros. Although the Hurdle model tends to be the superior model, there are situations when others, including the negative binomial Poisson model, are superior. The findings suggest that the researcher should consider the proportion of zeros and the distribution for the nonzeros when selecting a model to accommodate zero-inflated data.
Keywords/Search Tags:Model, Zeros, Poisson, Data, Hurdle
Related items