Font Size: a A A

Research On Histogram Publishing Algorithm Under Differential Privacy

Posted on:2020-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:H X TangFull Text:PDF
GTID:2428330590995392Subject:Information security
Abstract/Summary:PDF Full Text Request
With the deepening and popularization of information technology,data acquisition,storage,distribution and analysis become fast and convenient.Data mining technology can obtain valuable information from all kinds of published data,but at the same time it will also cause the leakage of personal information.So how to publish effective data while ensuring that personal information is not leaked is a major challenge in the field of privacy protection.Differential privacy is one of the current effective privacy protection mechanisms.By adding noise to sensitive data,the query output is randomized to achieve the purpose of privacy protection.Differential privacy ensures that no matter how much background knowledge an attacker has,it is still impossible to deduce the information of a particular data record,but at the same time the statistical analysis results of the data remain.At present,differential privacy has been applied to many fields,such as histogram publishing,data mining,machine learning and so on.Differential privacy protects privacy by adding noise.Privacy budget involves the intensity of noise addition,which directly affects the availability of data published by histogram.Therefore,how to allocate the privacy budget reasonably is a major challenge for Differential Privacy Algorithms.Based on this,this paper proposes two new histogram publishing algorithms from two different scenarios to realize differential privacy histogram publishing.Because interactive histogram publishing algorithm completes data publishing through the interaction between analysts and data owners,each interaction consumes a certain privacy budget,so the number of queries is limited.In this paper,some existing algorithms are deeply studied and analyzed.It is found that these algorithms improve the number of queries and the availability of published data by optimizing queries,while ignoring the allocation of privacy budget.Aiming at this defect,this paper proposes an interactive histogram publishing algorithm IPPB,which uses Poisson distribution to distribute privacy budget.The innovation of this algorithm mainly embodies two points.Firstly,it achieves infinite allocation of privacy budget,so that the number of queries is not limited.Secondly,it ensures the data availability of the previous k queries by optimizing the allocation weight of privacy budget.Existing non-interactive histogram publishing algorithms show that grouping first and then adding noise is an effective means to improve the availability of histogram data publishing.This process involves two noise-adding processes,one for protecting grouping structure and one for protecting histogram data.How to allocate privacy budget to balance the errors introduced in these two noise-adding processes is a challenge for non-interactive histogram publishing algorithms.Aiming at this problem,this paper proposes an adaptive histogram publishing algorithm APB for privacy budget allocation strategy.By analyzing the noise errors and reconstruction errors introduced before and after grouping,an optimization model of privacy budget allocation weight is established,and the relationship between the optimal allocation weight,grouping size and the number of groupings is obtained.Based on the optimization model and the idea of greedy grouping,an optimization model is proposed.An adaptive privacy budget allocation strategy can better balance noise errors and reconstruction errors and improve the availability of published data.Finally,through a number of experiments on three data sets,it is verified that the two algorithms proposed in this paper improve the availability of published data compared with the existing algorithms.
Keywords/Search Tags:Differential privacy, interactive, non-interactive, histogram publishing, privacy budget, Poisson distribution
PDF Full Text Request
Related items