Font Size: a A A

Research On High-Dimensional Linked Data Publishing Based On Local Difference Privacy

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiuFull Text:PDF
GTID:2428330605456905Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Research on high-dimensional data publishing based on local differential privacy is a hot topic in the current privacy protection research field.However,the existing research is mainly focused on the research of centralized data sets containing low-dimensional data.There is very little research on distributed data sets containing high-dimensional data.Moreover,the correlation between data attributes is not considered.More importantly,The problem of association of attributes in the data set will greatly affect the effectiveness of privacy protection.Therefore,this paper proposes a method for applying local differential privacy data release research for high-dimensional linked data.Firstly,to address the problem that random response techniques of probability statistics cannot satisfy the high-dimensional data set sufficient perturbation,this paper proposes a local differential privacy method combining Bloom filter ideas and random response techniques.Specifically,the method uses a Bloom filter on multiple hash functions to hash all attribute values in the attribute domain into a predefined space.Then,formulate random response rules to increase the randomness of the disturbance processSecondly,for the problem that the EM algorithm is only applicable to low-dimensional data distribution,this paper proposes an improved method.This method combines the independence of Bloom filter and Bayes' theorem,that is,through the independence of Bloom filter,the edge distribution of each individual attribute is combined to calculate the joint distribution probability.Next,the posterior probability of the attribute set is calculated by Bayes' theoremThirdly,in view of the neglect of the correlation between the attributes of high-dimensional data sets and the low privacy protection effectiveness,this paper proposes a correlation measurement method based on mutual information.This method constructs a dependency graph by calculating the mutual information between all attributes.Then,combining the idea of triangulation,the dependency graph is transformed into a node tree consisting of compact attribute clusters.In addition,in order to solve the problem of high mutual information computation complexity,this paper proposes an entropy-based pruning scheme.This scheme reduces the size of the entire attribute domain by removing attribute pairs with less entropy in the attribute domain,and reduces the number of pairwise calculations between attributes.Finally,combining the above steps,the disturbance data set can be reconstructed to achieve high-dimensional data release protection based on local differential privacy.Finally,this paper uses three open source data sets,Retail,Adult,and TPC-E for experiments,and evaluates the effectiveness of this method through a variety of metrics,including the average running time before and after the disturbance,the average variant distance,and the average cosine similarity.The comparison and analysis of correlation loss rate,complexity reduction rate,SVM classification and random forest classification show that the algorithm in this paper can better preserve the correlation between attributes and the utility of reconstructed data sets.Figure[9]table[6]reference[53]...
Keywords/Search Tags:local differential privacy, high-dimensional data publishing, distributed data set, linked data
PDF Full Text Request
Related items