Font Size: a A A

Data privacy in knowledge discovery

Posted on:2011-05-27Degree:Ph.DType:Thesis
University:Rutgers The State University of New Jersey - New BrunswickCandidate:Jagannathan, GeethaFull Text:PDF
GTID:2468390011472743Subject:Computer Science
Abstract/Summary:
This thesis addresses data privacy in various stages of extracting knowledge embedded in databases. Advances in computer networking and database technologies have enabled the collection and storage of vast quantities of data. Legal and ethical considerations might require measures to protect an individual's privacy in any use or release of the data.;In this thesis, we address the problem of preserving privacy in the two following cases: (1) in distributed knowledge discovery; (2) in situations where the output of a data mining algorithm could itself breach privacy. We present results in two different models, namely secure multiparty computation (SMC) and differential privacy. The first part of the thesis presents privacy preserving protocols in the SMC model. Secure multiparty computation involves the collaborative computation of functions based on inputs from multiple parties. The privacy goal is to ensure that all parties receive only the final output without any party learning anything beyond what can be inferred from the output. Within this framework we address the problem of preserving privacy in the preprocessing and the data mining stages of knowledge discovery in databases. For the preprocessing stage, we present private protocols for the imputation of missing data in a dataset that is shared between two parties. For the data mining stage, we introduce the notion of arbitrarily partitioned data that generalizes both horizontally and vertically partitioned data. We present a privacy-preserving protocol for k-means clustering of arbitrarily partitioned data. We also develop a new simple k-clustering algorithm that was designed to be converted into a communication-efficient protocol for private clustering.;The second part of the thesis deals with privacy in situations where the output of a data mining algorithm could itself breach privacy. In this setting, we present private inference control protocols in the SMC model for On-line Analytical Processing systems. In the differential privacy model, the goal is to provide access to a statistical database while preserving the privacy of every individual in the database, irrespective of any auxiliary information that may be available to the database client. Under this privacy model, we present a practical privacy preserving decision tree classifier using random decision trees.
Keywords/Search Tags:Privacy, Data, Preserving, Present, Model, Thesis
Related items