Font Size: a A A

Analysis Of Graph Data Distributions Satisfying Node Differential Privacy

Posted on:2020-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:X D ZhangFull Text:PDF
GTID:2428330590458369Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As an important storage mode of social network data,graph data plays an important role in network analysis and data mining.Triangle counting in graphs is an essential role of studying models for social networks.However,by studying the triangles,the analysts can easily identify the central roles of the network and discover the small group network structure,which can be used to distinguish the owner of the spam.Therefore,directly publishing information of triangles in the graph will bring the leakage of user privacy.Node differential privacy protection technology is based on its strict theory proof,and can effectively protect the privacy information of any node in the graph from being disclosed,which can be used to protect the information of triangular computation.What is more,it can not only protect the privacy of nodes,but retains the statistical characteristics of the graph.Satisfying the node differential privacy,directly publishing triangle counts will bring a huge amount of noise,which will make the data availability extremely poor.By removing some unnecessary edges and controlling the triangle of one node within the threshold can make the noise effectively controlled.And it can greatly reduce the amount of noise and achieve an optimal balance between the privacy and the usability of published data.Based on the edge-deletion process,the cumulative triangle count histogram is much better than the noncumulative histogram.And for the local clustering coefficients,the node differential privacy protection technology can also achieve good effect by grouping the coefficients before publishing them.Experiments show that,when publishing results of triangle counts,different thresholds will bring different publishing effects for the datasets.Generally,Small datasets corresponds to low threshold while large data sets corresponds to high threshold,so that it can achieve the best effect of retaining as much information of the original datasets as possible.When the non-cumulative histogram of node local clustering coefficients is released,experiments show that the threshold parameter affects the publishing effect while the grouping parameter affects the retention level of the original graph feature.Experiments also show that the more coefficients is grouped into,the more statistical information can be observed,while the larger the threshold is set,the privacy level will be higher.Therefore,different datasets correspond to different parameter selection strategies,and appropriate parameters can achieve great balance between the privacy and the histogram effects.
Keywords/Search Tags:Graph Data, Differential Privacy, Triangle Counting, Local Clustering Coefficient
PDF Full Text Request
Related items