Privacy and utility analysis of the randomization approach in Privacy-Preserving Data Publishing

Posted on:2009-03-20

Degree:Ph.D

Type:Thesis

University:Syracuse University

Candidate:Huang, Zhengli

Full Text:PDF

GTID:2448390002999462

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Randomization has emerged as an important approach for data disguising in Privacy-Preserving Data Publishing (PPDP). Due to different data it is applied to, the randomization approach falls into into two classes: Random Perturbation (RP) for continuous data and Randomized Response (RR) for categorical data. In PPDP, utility is an important metric and referred to the preservation of data mining information, while, as a more important metric, privacy is referred to the preservation of the original information. Privacy can be determined by different aspects, such as attribute correlations, randomization parameters, etc. However, in the aspect of the attribute correlations, no one has studied whether it is a factor affecting privacy and how it affects the privacy preserving property of the randomization; in the aspect of the randomization parameters, no one has investigated how to systematically compare different randomization parameters and what the optimal randomization parameters are so that the disguised data are most privacy-preserved but still useful for data mining computations.;This thesis addresses these problems. First, we identify that a key factor to affect privacy is the correlations among attributes. We propose two data reconstruction methods that are based on continuous attribute correlations. We have analyzed the relationship between data correlations and the amount of private information that can be disclosed based on our proposed data reconstructions schemes. Our studies have shown that when the correlations are high, the original data can be reconstructed more accurately, i.e., more private information can be disclosed. To improve privacy, we propose a modified randomization scheme based on the identified factor, the attribute correlations. Our experimental results have shown that, as the improved randomization method is used, the reconstruction accuracy of both reconstruction methods becomes worse, or less private information is disclosed. Second, for RR, we formulate the quantifications of privacy and utility as estimate problems. By using the quantifications to compare different RR schemes, we employ an evolutionary multi-objective optimization method to find optimal randomization parameters of RR. The experimental results have shown that our scheme has a much better performance than the existing RR schemes. Third, for RP, we first formulate an RP technique which is more general than the existing RP technique. After generaling RP technique, we discretize the data range and use a matrix to hold the randomization parameters. We also formulate the quantifications of privacy and utility for the generalized RP technique as estimate problems. Because to measure utility is expensive, we propose an efficient approach to approximate it. According to the privacy and approximate utility metrics, we utilize an evolutionary multi-objective optimization method to find optimal randomization parameters of RP. We show that our scheme to choose the parameters has outperformed the existing scheme.

Keywords/Search Tags:

Randomization, Data, Privacy, Approach, RP technique, Utility, Attribute correlations, Scheme

PDF Full Text Request

Related items

1	Research On Location Privacy Issues Using Information Theory
2	Research On Privacy Preserving Techniques Based On K-Anonymity For Data Publishing In The Social Network
3	Learning from perturbed data for privacy-preserving data mining
4	Balancing Behavioral Privacy and Information Utility in Sensory Data Flows
5	Research On Privacy Protection Of Information Sharing For Utility Minin
6	Enhancing Utility in Privacy Preserving Data Publishing
7	Privacy and spectral analysis of social network randomization
8	A Utility-Aware Privacy Preserving Framework For Distributed Data Mining With Worst Case Privacy Guarantee
9	Research On Attribute Base Encryption Scheme Based On Ciphertext Policy
10	Research On Differential Privacy Preservation For Data Analysis