Dirichlet Process Mixture Models for Nested Categorical Data

Posted on:2016-06-02

Degree:Ph.D

Type:Thesis

University:Duke University

Candidate:Hu, Jingchen

Full Text:PDF

GTID:2478390017470375

Subject:Statistics

Abstract/Summary:

This thesis develops Bayesian latent class models for nested categorical data, e.g., people nested in households. The applications focus on generating synthetic microdata for public release and imputing missing data for household surveys, such as the 2010 U.S. Decennial Census.;The first contribution is methods for evaluating disclosure risks in fully synthetic categorical data. I quantify disclosure risks by computing Bayesian posterior probabilities that intruders can learn confidential values given the released data and assumptions about their prior knowledge. I demonstrate the methodology on a subset of data from the American Community Survey (ACS). The methods can be adapted to synthesizers for nested data, as demonstrated in later chapters of the thesis.;The second contribution is a novel two-level latent class model for nested categorical data. Here, I assume that all configurations of groups and units are theoretically possible. I use a nested Dirichlet Process prior distribution for the class membership probabilities. The nested structure facilitates simultaneous modeling of variables at both group and unit levels. I illustrate the modeling by generating synthetic data and imputing missing data for a subset of data from the 2012 ACS household data. I show that the model can capture within group relationships more effectively than standard one-level latent class models.;The third contribution is a version of the nested latent class model adapted for theoretically impossible combinations, e.g. a household with two household heads or a child older than her biological father. This version assigns zero probability to those impossible groups and units. I present a proof that the Markov Chain Monte Carlo (MCMC) sampling strategy estimates the desired target distribution. I illustrate this model by generating synthetic data and imputing missing data for a subset of data from the 2011 ACS household data. The results indicate that this version can estimate the joint distribution more effectively than the previous version.

Keywords/Search Tags:

Data, Nested, Model, Household, Latent class, ACS, Version

Related items

1	An Incremental Learning Method For Hierarchical Latent Class Models
2	Tourism Attraction Assessment Method Based On Latent Class Logit Model
3	Extensions of latent class trajectory models
4	A class of functional dependencies for the nested relation database model
5	Automatic Text Multi-Classification Model Based On Class Latent Semantic
6	A new form of nested association pattern for data mining and class discrimination
7	Latent Class Analysis and Random Forest Ensemble to Identify At-Risk Students in Higher Educatio
8	Latent class analysis of serial murderer
9	Multimodal Construction Of Mashup Applications Based On Spreadsheet And Data Flow
10	A Study On Wen Xuan Carved And Printed In Mingzhou