Font Size: a A A

SECURITY OF STATISTICAL DATABASES: INVASION OF PRIVACY THROUGH ATTRIBUTE CORRELATIONAL MODELING (COMPROMISE, DISCLOSURE)

Posted on:1986-10-22Degree:Ph.DType:Dissertation
University:New York University, Graduate School of Business AdministrationCandidate:PALLEY, MICHAEL AFull Text:PDF
GTID:1478390017960260Subject:Computer Science
Abstract/Summary:
This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical database. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical database represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable.;The ACM compromise technique is applied to subsamples of the 1980 United States Census Database, C-Sample, for the State of New York. ACM was found to be an effective means of compromising confidential salary information, given the intruder's supplemental knowledge of five nonconfidential variable values. This held true, even when the statistical database precluded direct application of regression analysis, where synthetic database creation was employed.;Four classes of existing statistical database confidentiality protection methods (refusing queries with small counts, random data perturbation, random sample queries, and "data-swapping") are shown to have little, if any, effect on the use of ACM as a tool for compromise. The problem of incomplete intruder supplemental knowledge of nonconfidential variable values was found to be surmountable. Various complicating and facilitating factors affecting ACM compromise, as well as guidelines for the assessment of a database's relative degree of exposure to ACM compromise, are described.;The typical statistical database may preclude the direct application of regression. In this scenario, the research introduces the notion of a "synthetic database", created through legitimate queries of the actual database, and through proportional random variation of responses to these queries. The synthetic database is constructed to resemble the actual database as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic database, and utilizes the derived model to estimate confidential information in the actual database.
Keywords/Search Tags:Database, Compromise, Confidential information, ACM, Attribute, Correlational, Regression
Related items