Font Size: a A A

Differential Privacy For Sparse Data Publishing

Posted on:2015-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:T T ChenFull Text:PDF
GTID:2308330461974640Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Information can be quickly spread and use through the network. It brings convenience, but also makes the personal information more likely to cause leakage. Therefore, privacy protection has become an important area of research. Many organizations collect and analysis the data that has potential value, but this kind of data often contain sensitive information of users. Publish such data would threaten users’ privacy. How to make the published data do not leak the users’ sensitive information, but also provide enough information for mining. It is the emphasis and difficulty of the research of privacy protection.Most of the existing privacy protection model based on anonymous mechanisms that require special attacks assumptions and background knowledge, it has certain limitations. To this end, Dwork proposed the differential privacy model that can against any background knowledge attack. This model can provide stronger privacy protection.The main goal of this paper is to design effective algorithm of sparse data publishing under differential privacy model. In order to improve the utility of published sparse data.The contributions of the paper are as follows:(1) The statistics query based on the differential privacy model of sparse contingency table data that has negative answer. In view of the problem of reduce data utility, we propose a data publishing algorithm that based on filter and add-back method under differential privacy of sparse contingency table data. The algorithm by filtering out illogical data and the filtered data classification gross added back to retain data. If the minimum value of the retain data does not reach the threshold, then to cyclically filter and added back of the data. Statistics query results show that the algorithm can effectively improve the utility of the release contingency table data.(2) For the existing low precision of region query problem of two-dimensional sparse data publishing under differential privacy. We have design a two-dimensional sparse data publishing algorithm under differential privacy that based on consistency constraint. The algorithm combines sampling and quadtree. First, we get the sampling set by filtering-sampling algorithm, in order to achieve the disturbance of real data. Then, regard the entire data area as a root node, the recursion to four points until the area without sampling point. Finally, using an unbiased estimation algorithm to adjust the inconsistency between the tree nodes. The simulations results show that the method can effectively improve the precision of region query of two-dimensional sparse data publishing under differential privacy.(3) Similarly for the utility problem of two-dimensional sparse data, we propose a published algorithm that based on kd-tree division of the two-dimensional sparse data under differential privacy. On the first layer evenly divided grid then redivided the region based on kd-tree. Using an adaptive partition method to divide the every grid of the first layer that divide based on kd-tree, and still use the kd-tree to divide it again. Assigning the privacy parameter to the two-layer data, which is divided based on kd-tree. Using linear unbiased estimation algorithm to adjust the inconsistency of the two-layer data. Finally, publishing the two-layer data to provide query. The results by the comparing experimental show that the proposed method can effectively improve the utility of published data.
Keywords/Search Tags:Differential Privacy, Data Publishing, Sparse Data, Statistical Query, Precision of Query
PDF Full Text Request
Related items