Font Size: a A A

Sampling contingency tables given sets of marginals and/or conditionals in the context of statistical disclosure limitation

Posted on:2010-07-05Degree:Ph.DType:Thesis
University:The Pennsylvania State UniversityCandidate:Lee, JuyounFull Text:PDF
GTID:2448390002977765Subject:Statistics
Abstract/Summary:
Federal agencies and other organizations often publish data summarized in arrays of non-negative integers, called contingency tables. When such data are released, it is necessary to prevent sensitive information pertaining to individuals from being disclosed. In statistical disclosure limitation, we must maintain a balance between disclosure risk and the data utility needed to make valid statistical inferences. One method for achieving this balance is to release partial information about the original data. In practice, many agencies release data summarized in the form of marginal sums or conditional probabilities. Sampling methods for multi-way contingency tables given a set of observed marginal sums have been studied in diverse ways; yet, there is almost no literature about sampling of tables given a set of observed conditional probabilities. In this thesis, we focus on a set of conditional probabilities instead of marginal sums. We propose MCMC simulation schemes coupled with tools from algebraic statistics to sample tables from the sets of possible tables given observed conditional values. We also propose a simple extension to the case given a combination of observed marginal totals and conditional values. These algorithms can be used to compute posterior distribution and assess data utility and disclosure risk in the context of statistical disclosure limitation. We demonstrate the proposed algorithms with simple examples and discuss their advantages and disadvantages. In addition, proposed sampling algorithms can be used for releasing synthetic contingency tables. We study both the disclosure risk and data utility associated with proposed synthetic tabular data releases.
Keywords/Search Tags:Contingency tables, Disclosure, Data, Conditional, Marginal, Sampling
Related items