Font Size: a A A

Research On Algorithms Of Differential Privacy Statistics Data Publishing

Posted on:2014-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2308330461472570Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology, privacy preserving has been a more and more important topic. Numerous organizations maintain large collections of data with sensitive personal information. It contains very high research value by publishing these data, but it may threat the personal privacy at the same time. How to ensure the releasing data is available without leaking personal privacy information of the data has been a hot research topic in recent years.Many existing privacy preserving mechanisms ignore the adversary with background knowledge, or are limited to protect special type of attack. In order to strengthen the personal privacy preserving while releasing the data, Dwork proposes a robuster privacy preserving mechanism-differential privacy. Data publishing with differential privacy mechanism can protect individual’s privacy in the database, even with the presence of any adversary’s background knowledge.In this paper, we focus on researching some statistics data publishing problems based on differential privacy. We aim at designing an efficient data publishing algorithm with considering both data privacy security and data utility.The contributions of the paper are as follows:(1) For the sparse data, we design an efficient differential privacy data publishing algorithm based on bucket partition. The algorithm figures out a good bucket partition quickly with a greedy method. And we use red-black tree to optimize the search operation and union operation of closest buckets in the algorithm. Comparing to the existing algorithm based on bucket partition, the results of the experiments show the algorithm can improve the data publishing efficiency with reducing little data quality.(2) Handling big range counting query with existing differential privacy method, exists the problem of low accuracy response, so we suggest a model by adding noise on the "k-ary average tree". The model transforms the original data to a "k-ary average tree", and then adds noises in each node of this tree. Finally it exports the releasing data from the "k-ary average tree" with a formula. The paper analyses the best k to minimize the upper bound of the response noise variance. What’s more, we prove the noise variance with taking the best k is smaller than existing algorithms. We compare the results by two kinds of experiments with fixed range queries and random range queries. The experiments demonstrate the algorithm can improve the accuracy effectively.(3) The one-dimensional range query is solved by adding noise on the "k-ary average tree", and we expand the method to solve the problem of poor accuracy in multi-dimensional combination query. The main idea of the algorithm is as follows: First we scan every dimensional in order and divide the multi-dimensional data to several one-dimensional datas to build "k-ary average tree", respectively. Than we add noise to every element. Finally we scan every dimensional in reverse order and divide the multi-dimensional data to several one-dimensional datas to export the releasing data from "k-ary average tree" with a formula. Two experiments with synthesized data and real data demonstrate that the proposed algorithm can improve the accuracy of multi-dimensional combination query effectively.
Keywords/Search Tags:Differential Privacy, Statistical Data Publishing, Sparse Data, Range Counting Query, Multi-dimensional Combination Query
PDF Full Text Request
Related items