Font Size: a A A

Key Issues For Data Release In Differential Privacy

Posted on:2015-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2308330461974672Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of technology and society, privacy protection has become a topic with increasing importance. Many organizations have a lot of data, people often can mine information from them, so it is significant to release these data. However, the data may contain users’privacy, personal privacy will be threatened. How to weigh the data utility and the level of protection of personal privacy has become a very popular field of database research.In recent years, people made a lot of privacy preserving data publishing model. However, the existing privacy protection model are mostly based on anonymous and needs special attack assumptions and some background knowledge, so they have certain limitations. Thus, Dwork et al proposed a more robust privacy protection model-differential privacy. Publish data using differential privacy mechanism can protect the privacy of individuals in the database even if the attacker has a database of background knowledge.(1) The existing solution to boost accuracy of differential privacy histograms through consistency by best linear unbiased estimator has strict constrain on the size of histogram and the structure of range-tree. This paper proposes one iterative algorithm when the range-tree is with arbitrary structure. The algorithm run bottom-up local optimal linear unbiased estimation in every iteration, which will guarantee consistency.we strictly prove the convergence and correctness of the algorithm, and estimate its bound of iterations. Experimental results demonstrate the effectiveness of the algorithm. In the real data sets, the performance of our algorithm is outstanding when compared with similar algorithms.(2) In the background of range count queries is uniformly distributed and the noises are with same parameter, we do some statistical analysis of the probability on the range-tree and obtain the expected variance of range queries. One greedy strategy was designed to smartly construct range-tree such that the expected variance will reduce. Theoretical analysis shows that the expected variance in our tree is less than the one in the default k-range tree. In the experiment on real data set, our algorithm have higher accuracy than other algorithms.(3) In the background of range count queries is uniformly distributed and the noises are with different parameter, we do some statistical analysis of the probability on the range-tree and obtain the expected variance of range queries. One differential parameter allocation solution was designed to get the global optimum of expected variance. In the experiment on real data set, our algorithm have higher accuracy than other algorithms. Moreover when we smartly construct the range-tree with the greedy strategy before running the differential privacy parameter allocation algorithm, the accuracy is extremely higher than other ones.
Keywords/Search Tags:Differential Privacy, Histogram Publishing, Range Counting Query, Probability
PDF Full Text Request
Related items