Font Size: a A A

Design And Implementation Of Accurate Histogram Publishing Algorithm Based On Differential Privacy Zhang Haoming

Posted on:2020-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhangFull Text:PDF
GTID:2428330596973185Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
A histogram,an accurate graphical representation of the distribution of numerical data,which is widely used for data publication,data mining and analysis.However,this seems to have a privacy vulnerability if we use the original data for histogram publication directly.Differential Privacy(DP),as a mathematical definition,is an ideal solution for releasing the summary data of statistic queries in the database and provides strong guarantees of privacy protection against adversaries with arbitrary background knowledge.This paper studies the exact histogram publishing algorithms that satisfy Differential Privacy.The main goal of this paper is to improve the accuracy of published histogram by ensuring that the releasing algorithm meets the differential private model.In this paper,we first analyze variants of V-optimal histogram algorithm,NoiseFirst and StructureFirst.Then we further study all these by recovering the algorithms and point out the existing problems and shortcomings of these.In allusion to these problems,we present the following optimization method and solution.Secondly,we first propse an optimizing structure algorithm.The algorithm first divides the privacy budget into two parts by the approximate optimal solution.Then it performs an iteractive dynamic programming to obtain the optimal grouping structure,so that the new histogram has been generated has the smallest grouping error.Next,it protects the structure privacy by using the optimal Exponential Mechanism and computes the average of each group.Finally,it adds Laplace noise to the average of every group and publishes the optimal histogram.This algorithm is compared with an existing method(i.e.,Boost)on the real dataset.The experimental results show that the optimal StructureFirst algorithm outperforms its competitors and achieves the more accurate results.Thirdly,we design a new novel mechanism which publishes more accurate histograms while satisfying differential privacy.To begin with,we take the Randomized quicksort algorithm to order the histogram injected Laplace noise so that lower errors.The randomized method can efficiently resist malicious attack from an adversary who knows the sorting algorithm and makes use of specific data.This is the reason why we use it.Moreover,on the grouping problem,we suggest a minimum error cost solution based on dynamic programming idea for further improve the accuracy of the histogram published.Finally,we provide an exhaustive experimental evaluation using two real datasets,ranging from thousands to millions of records.The datasets come from various domains(e.g.,the IPUMS census of Brazil,the Hillary Clinton and Donald Trump Tweets)on the Internet.Our outcomes give the data comparison and confirm the superiority of our mechanism over other similar methods.Meanwhile,it confirm that our technique produces a smaller error and considerably improve the accuracy of range queries over histograms.
Keywords/Search Tags:Histogram, Differential privacy, Dynamic programming
PDF Full Text Request
Related items