Font Size: a A A

Using discrete multivariate MCMC Bayesian methods for change detection and disclosure control

Posted on:2006-12-07Degree:Ph.DType:Dissertation
University:Rutgers The State University of New Jersey - NewarkCandidate:Mendez Mediavilla, Francis AFull Text:PDF
GTID:1458390005492801Subject:Business Administration
Abstract/Summary:
In this study we present a Bayesian statistical technique to detect change in the probability distributions underlying materialized views in statistical information systems. The main idea is to improve the performance of aggregate materialized views by reducing the length of the update window. By means of a Monte Carlo simulation we demonstrate: (1) that we can reproduce a detailed aggregate view from a higher level aggregate, achieving data reduction; (2) that we can provide the end-user with posterior distributions for the distance measures; and (3) that we can detect structural changes in the base data, using distance measures.; The setup we present is meant to improve the performance of aggregate materialized views by providing approximate query answers. We demonstrate: (1) that we can provide the end-user with point estimates to support decision making and business intelligence; and (2) that we can provide the end-user with posterior distributions and posterior confidence intervals for each elementary cell in the hypercube, to support decision making and business intelligence. We demonstrate that we can improve the performance of dynamic view management systems, DynaMat, as proposed by Kotidis and Roussopoulos (2001).; We present a methodology to create discrete multivariate datasets that can be used for discrete multivariate research. This methodology enables researchers to study data mining algorithms by simulating datasets with known structures and varying degrees of sparseness. Particularly, the data simulations can be used to test association rule mining algorithms and compare the findings with the known characteristics of the dataset. This is the methodology that we have used to generate discrete multivariate datasets throughout this study.; Finally, we present a method to profile microdata file records that are under high risk of re-identification. This method is an extension of the Bayesian analysis of contingency tables. Using this method we obtain the posterior distribution for cells with low counts. The measure of the risk of re-identification is based on the posterior distribution of the elementary cells in the hypercube that cross-classifies the attributes in the microdata file.
Keywords/Search Tags:Discrete multivariate, Bayesian, Materialized views, Improve the performance, Provide the end-user, Posterior, Using, Method
Related items