Font Size: a A A

Model checking for incomplete high-dimensional categorical data (Incomplete data)

Posted on:2000-07-04Degree:Ph.DType:Dissertation
University:University of California, Los AngelesCandidate:Hu, Ming-YiFull Text:PDF
GTID:1468390014460658Subject:Statistics
Abstract/Summary:
Categorical data are often arranged in a contingency table and summarized by a loglinear model. A standard approach for comparing two competing models is to calculate twice the discrepancy between maximized loglikelihoods, which follows a χ2 distribution asymptotically. But when data are sparse, the χ2 approximation may be questionable.; As an alternative to a large-sample approximation to the reference distribution, we implement the framework introduced by Rubin (1984) for finding the posterior predictive check (PPC) distribution. The PPC distribution represents the conditional probability of a future value of a test statistic based on the information given by observed data along with model specifications, which can serve as the reference distribution for the relevant likelihood-ratio statistics.; However, it can be computationally demanding to construct a PPC distribution based on a large number of replicates. This is especially the case when the original data are incomplete, since generation of each PPC replicate requires an involved statistical computing approach (we use a data-augmentation strategy). In practice, we propose to approximate the PPC distribution by a gamma distribution whose parameters are estimated by a combination of training data and a modest-sized sample of PPC replicates. Some simulated examples suggest that this procedure, which can reduce the computation needed to approximate the PPC distribution by a factor of 20, has satisfactory statistical properties.
Keywords/Search Tags:Data, PPC distribution, Model, Incomplete
Related items