Research On High-Dimensional Linked Data Publishing Based On Local Difference Privacy

Posted on:2021-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Liu

Full Text:PDF

GTID:2428330605456905

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Research on high-dimensional data publishing based on local differential privacy is a hot topic in the current privacy protection research field.However,the existing research is mainly focused on the research of centralized data sets containing low-dimensional data.There is very little research on distributed data sets containing high-dimensional data.Moreover,the correlation between data attributes is not considered.More importantly,The problem of association of attributes in the data set will greatly affect the effectiveness of privacy protection.Therefore,this paper proposes a method for applying local differential privacy data release research for high-dimensional linked data.Firstly,to address the problem that random response techniques of probability statistics cannot satisfy the high-dimensional data set sufficient perturbation,this paper proposes a local differential privacy method combining Bloom filter ideas and random response techniques.Specifically,the method uses a Bloom filter on multiple hash functions to hash all attribute values in the attribute domain into a predefined space.Then,formulate random response rules to increase the randomness of the disturbance processSecondly,for the problem that the EM algorithm is only applicable to low-dimensional data distribution,this paper proposes an improved method.This method combines the independence of Bloom filter and Bayes' theorem,that is,through the independence of Bloom filter,the edge distribution of each individual attribute is combined to calculate the joint distribution probability.Next,the posterior probability of the attribute set is calculated by Bayes' theoremThirdly,in view of the neglect of the correlation between the attributes of high-dimensional data sets and the low privacy protection effectiveness,this paper proposes a correlation measurement method based on mutual information.This method constructs a dependency graph by calculating the mutual information between all attributes.Then,combining the idea of triangulation,the dependency graph is transformed into a node tree consisting of compact attribute clusters.In addition,in order to solve the problem of high mutual information computation complexity,this paper proposes an entropy-based pruning scheme.This scheme reduces the size of the entire attribute domain by removing attribute pairs with less entropy in the attribute domain,and reduces the number of pairwise calculations between attributes.Finally,combining the above steps,the disturbance data set can be reconstructed to achieve high-dimensional data release protection based on local differential privacy.Finally,this paper uses three open source data sets,Retail,Adult,and TPC-E for experiments,and evaluates the effectiveness of this method through a variety of metrics,including the average running time before and after the disturbance,the average variant distance,and the average cosine similarity.The comparison and analysis of correlation loss rate,complexity reduction rate,SVM classification and random forest classification show that the algorithm in this paper can better preserve the correlation between attributes and the utility of reconstructed data sets.Figure[9]table[6]reference[53]...

Keywords/Search Tags:

local differential privacy, high-dimensional data publishing, distributed data set, linked data

PDF Full Text Request

Related items

1	High-dimensional Data Publishing Algorithms Based On Local Differential Priacy
2	Research On The Theory And Method Of Differential Privacy Synthetic Data Publication
3	Research On Locally Differentially Private Mechanisms For Data Publishing In Crowdsensing Systems
4	Local Differential Privacy Preserving Of High-Dimensional Data
5	Perturbed Data Publishing With Local Differential Privacy Constraints
6	Research On Local Differential Privacy Method For High-dimensional Data Based On Improved Bayesian Network
7	Research On Algorithms Of Differential Privacy Statistics Data Publishing
8	Differential Privacy Protection Data Release Via Bayesian Network
9	Association Data Release Based On Local Differential Privacy
10	Research On Multidimensional Correlation Hierarchical Differential Privacy Method For High-dimensional Data Publishing