Font Size: a A A

Research And Implementation Of Data Release Technology To The Public Under Differential Privacy

Posted on:2022-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:F Y RenFull Text:PDF
GTID:2518306605466844Subject:Cryptography
Abstract/Summary:PDF Full Text Request
After entering the information age,many service providers have accumulated a large amount of user data.For example,patient's disease information recorded by medical institutions,user transaction behavior recorded by financial institutions,and user account information on social networks.Sharing and opening up these data can fully tap the value of data and promote economic development.However,these data sets usually contain many private data of individuals,which are released directly or without effective privacy protection measures,which will inevitably lead to the leakage of personal privacy.How to safely disclose the data you own without revealing the user privacy contained in it?The existing technologies can be divided into three types: widely used desensitization technology,secure multi-party computing,Homomorphic encryption and anonymous technology.Among them,the Differential Privacy(DP)technology proposed by Dwork in2006 protects personal privacy by adding noise to the data to be released,which has a series of advantages over other technologies.Therefore,the research content of this article is to use differential privacy technology to protect personal privacy in the process of data release.However,due to the differential privacy assumption that the adversary knows all the information except this piece of information,and the assumption of background knowledge is too strong,it is necessary to add a large amount of random noise to the query results to protect privacy,which will reduce the availability of data.Therefore,the research goal of this paper is to improve the availability of published data under the premise of satisfying differential privacy.Regarding the release of one-dimensional data,this article is based on the existing Iterative Histogram Partition(IHP)algorithm.When selecting a grouping scheme for its root node,the greater the error,the greater the probability of being selected.This shortcoming is improved.,Consider using the distance between adjacent frequencies as a utility function,and propose a Discrepancy Private Hierarchical Partition(DPHP)algorithm.This improves usability from two aspects:(1)The greater the adjacent distance,the greater the probability of division here;(2)The sensitivity of the utility function used is 1,which is smaller than the utility function used by the iterative histogram division algorithm,Its sensitivity is 2.In addition,most one-dimensional data publishing algorithms only merge adjacent groups,which cannot minimize reconstruction errors.For this shortcoming,this article considers merging similar groups globally,and proposes a Global Clustering Algorithm(GCA)algorithm.This improves usability from two aspects:(1)Using the mean distance of two groups to measure the similarity of the two groups,and using the opposite number as the utility function of the exponential mechanism,the sensitivity is low;(2)The similarity table is filtered with the threshold and the error decreasing.When the merging scheme is selected by the exponential mechanism,the space to be selected can be greatly compressed,and the utility of the algorithm is improved.Considering that high-dimensional data contains a large number of attribute columns when it is released,the direct use of the Laplacian mechanism that satisfies differential privacy will inject too much noise,making the released data have poor usability.In this regard,this paper studies the Bayesian Network(BN)to measure the dependence between the attributes of high-dimensional data,and then selects the feature dimensions that can represent the original high-dimensional data set to reduce the dimensionality and add more Reduce the amount of noise to improve the effectiveness of usability.To improve some shortcomings in the Private Bayes(Private Bayes,Priv Bayes)algorithm,such as the mutual information is used to measure the one-way dependence between two attribute nodes,Improved privacy Bayes(IPriv Bayes)algorithm is proposed.The Improved Privacy Bayes algorithm improves usability from two aspects:(1)The introduction of conditional entropy to measure the oneway causality between attribute nodes is more reasonable than the use of mutual information to measure the one-way causality between attribute nodes.It is more accurate;(2)Only when the number of initial parent nodes of a certain attribute node is greater than k,the exponential mechanism is used to select k from them to meet the degree of Bayesian network.This greatly reduces the number of times the index mechanism is used,increases the privacy budget allocated to the index mechanism each time,and improves the usability.At last,the above three algorithms and other similar classic algorithms are applied to multiple identical standard data sets,and the usability of the noise-added data sets generated after the processing of each algorithm is compared,which proves that the algorithm proposed in this paper is more usable.
Keywords/Search Tags:Differential Privacy, histogram, high-dimensional data, data release, usability
PDF Full Text Request
Related items