Analysis Of Graph Data Distributions Satisfying Node Differential Privacy

Posted on:2020-12-01

Degree:Master

Type:Thesis

Country:China

Candidate:X D Zhang

Full Text:PDF

GTID:2428330590458369

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As an important storage mode of social network data,graph data plays an important role in network analysis and data mining.Triangle counting in graphs is an essential role of studying models for social networks.However,by studying the triangles,the analysts can easily identify the central roles of the network and discover the small group network structure,which can be used to distinguish the owner of the spam.Therefore,directly publishing information of triangles in the graph will bring the leakage of user privacy.Node differential privacy protection technology is based on its strict theory proof,and can effectively protect the privacy information of any node in the graph from being disclosed,which can be used to protect the information of triangular computation.What is more,it can not only protect the privacy of nodes,but retains the statistical characteristics of the graph.Satisfying the node differential privacy,directly publishing triangle counts will bring a huge amount of noise,which will make the data availability extremely poor.By removing some unnecessary edges and controlling the triangle of one node within the threshold can make the noise effectively controlled.And it can greatly reduce the amount of noise and achieve an optimal balance between the privacy and the usability of published data.Based on the edge-deletion process,the cumulative triangle count histogram is much better than the noncumulative histogram.And for the local clustering coefficients,the node differential privacy protection technology can also achieve good effect by grouping the coefficients before publishing them.Experiments show that,when publishing results of triangle counts,different thresholds will bring different publishing effects for the datasets.Generally,Small datasets corresponds to low threshold while large data sets corresponds to high threshold,so that it can achieve the best effect of retaining as much information of the original datasets as possible.When the non-cumulative histogram of node local clustering coefficients is released,experiments show that the threshold parameter affects the publishing effect while the grouping parameter affects the retention level of the original graph feature.Experiments also show that the more coefficients is grouped into,the more statistical information can be observed,while the larger the threshold is set,the privacy level will be higher.Therefore,different datasets correspond to different parameter selection strategies,and appropriate parameters can achieve great balance between the privacy and the histogram effects.

Keywords/Search Tags:

Graph Data, Differential Privacy, Triangle Counting, Local Clustering Coefficient

PDF Full Text Request

Related items

1	Research On The Technology Of Graph Data Publication With Differential Privacy
2	Research On The Technology Of Graph Clustering And Generation With Differential Privacy
3	Research On Differential Privacy Protection Based On Classified Data
4	A Study Of Graph Data Publication Techniques Based On Differential Privacy
5	Research On Key Technologies Of Privacy Preserving Data Mining Based On Local Differential Privacy
6	Research On K-means Clustering Algorithm Based On Differential Privacy Protection
7	Research On Personalized Local Differential Privacy Scheme For Multidimensional Data
8	Perturbed Data Publishing With Local Differential Privacy Constraints
9	Research On Clustering Algorithms In Differential Privacy
10	Research On Privacy Protection Method Of Sensor-cloud Based On Local Differential Privacy