Font Size: a A A

Research On Differential Privacy Histogram Publishing Technology Based On Bucket Reconstruction

Posted on:2020-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:W T XuFull Text:PDF
GTID:2428330623463756Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Under the background of the rapid development of the Internet and the popularization and application of big data,some statistical organizations began to collect and publish large amounts of data and information.How to provide effective protection for personal privacy in the process of publishing data and make the data more usable should be a key problem to be solved in the aspect of privacy protection.Differential privacy,as a relatively perfect privacy protection model,has nothing to do with the background knowledge of the attacker,and has been proved by rigorous mathematics and provides quantitatively controllable privacy protection capability.As a result,the emergence of differential privacy has been widely recognized by the industry and gradually become a research hotspot.However,the existing differential privacy histogram publishing technology has not been able to deal with data sets with outliers efficiently and the accuracy of published data is not high.The innovation of this paper is based on bucket reconstruction.An efficient R-G-I method(Gradient Regression-Greedy Algorithms-Isotonic Regression)for differential privacy histogram publishing with outlier data sets is proposed.The main body of this method is three successive algorithms,namely Gradient Regression and Greedy Algorithms merging adjacent buckets.Greedy Algorithms,Isotonic Regression.The main work accomplished in this paper is as follows:(1)The reasons for bucket reconstruction in histogram publishing technology and the influence of outliers on bucket reconstruction and data publishing accuracy are analyzed.To solve the problem of outliers,a gradient regression algorithm is proposed.The basic idea of the algorithm is to compare the frequencies between adjacent barrels of injected noise,and to pre-process the corresponding frequencies of the original histogram barrel,so as to effectively reduce the alternating distribution degree,so as to minimize the impact of outliers on histogram reconstruction.(2)A greedy algorithm based on adjacent barrel merging is proposed.Firstly,the algorithm uses greedy idea to reconstruct the barrels of histogram,and chooses two barrels with the smallest distance between adjacent barrels to merge each time with greedy strategy,and uses red-black tree to optimize the merging process.Finally,Laplacian noise is added to each bucket of the reconstructed histogram to satisfy the differential privacy requirement.(3)The principle of maximizing resource utilization by Isotonic Regression is analyzed.Finally,the order-preserving regression algorithm in histogram publishing is used to correct the sequence disturbed by noise according to the order constraints of the original sequence,so as to improve the accuracy of histogram sequence and the query accuracy of histogram.(4)Contrast experiments were conducted using different real data sets.First,the original histogram with outliers is pretreated by gradient regression algorithm,then the histogram without pretreatment and which has been pretreated is reconstructed by StructureFirst method,and then the experimental results are compared and analyzed,so that the validity of gradient regression algorithm can be verified.Finally,the R-G-I publishing method is compared with the existing differential privacy histogram publishing method to verify the efficiency and accuracy of the R-G-I method.
Keywords/Search Tags:Differential Privacy, Gradient Regression, Bucket Reconstruction, Red-Black Tree, Isotonic regression
PDF Full Text Request
Related items