Font Size: a A A

Measuring the complexity of generalized linear hierarchical models and Bayesian areal wombling for boundary analysis

Posted on:2006-12-23Degree:Ph.DType:Dissertation
University:University of MinnesotaCandidate:Lu, HaolanFull Text:PDF
GTID:1458390008960684Subject:Biology
Abstract/Summary:
Model comparison is critical for choosing a model with a good fit and proper size of model complexity in Bayesian data analysis. However, it is unclear how to measure the complexity of hierarchical models due to the uncertainty regarding the appropriate contribution of the random effects. The first part of this dissertation develops a measure of complexity for generalized linear hierarchical models based on linear model theory. We demonstrate the new measure for Poisson and binomial observable modelled through a variety of hierarchical structures, including a simple random effects model, a longitudinal model and an areal data model having both spatial clustering and pure heterogeneity random effects. The new measure is compared to a Bayesian measure of model complexity, the effective number of parameters pD (Spiegelhalter et al., 2002), in the binomial and Poisson cases via simulation as well as three real data examples. The two measures are usually close, but differ markedly in some instances where pD is arguably inappropriate. Finally, we show how the new measure can be used to approach the difficult task of specifying prior distributions for variance components, in the process casting further doubt on the commonly-used vague inverse gamma prior.; In the analysis of spatially referenced data, interest often focuses not on prediction of the spatially indexed variable itself, but on boundary analysis, i.e., the determination of boundaries or zones that reveal sharp changes in the values of spatially oriented variables. Existing boundary analysis methods are sometimes generically referred to as wombling, after a foundational paper by Womble (1951). When data are available at point level (e.g., exact latitude and longitude of disease cases), such boundaries are most naturally obtained by locating the points of steepest ascent or descent on the fitted spatial surface (Banerjee, Gelfand, and Sirmans, 2004). In the second part of this dissertation we propose related methods for areal data (i.e., data which consist only of sums or averages over geopolitical regions) using a Bayesian hierarchical model-based framework for areal wombling with a known geographical neighborhood structure and show the approach's superiority over existing non-stochastic alternatives (including that implemented in the commercial software BoundarySeer). Such methods are valuable in determining boundaries for data sets that, perhaps due to confidentiality concerns, are available only in ecological (aggregated) format, or are only collected this way. In the final part of this dissertation, we extend the Bayesian hierarchical modeling methods by constructing a neighborhood structure that is determined by the value of the process in each region and variables determining how similar two regions are. We illustrate three different remedies to overcome the computing difficulty in implementing this method. Comparisons among existing algorithmic techniques and all proposed methods are made using both simulated data and a breast cancer and colorectal cancer late detection datasets collected at the county level in the state of Minnesota.
Keywords/Search Tags:Model, Complexity, Bayesian, Data, Areal, Linear, Boundary, Wombling
Related items