With the rapid development of information technology and the continuous progress of data collection technology,a large amount of graph structured data are produced on the Internet.These graph structured data contain rich knowledge and patterns such as community structure and frequent subgraphs.Extracting these patterns can provide support for group behavior analysis and communication interaction mechanism research.However,these data contain sensitive information such as user location and consumption records.Attackers can infer individual sensitive information from the personalized features presented by edges,nodes,and structures in the graph.Directly publishing graph structured data poses the risk of individual privacy leakage.In graph data publishing,how to effectively maintain the utility of graph structured data while preserving individual privacy has become an urgent problem to be solved.Differential privacy can quantitatively analyze the risk of privacy leakage and has strict theoretical proof,which has attracted increasing research attention in the field of data privacy protection in recent years.At present,the publishing of synthetic graphs under differential privacy has become a hot issue in the research of graph structured data privacy protection,which mainly faces three prominent problems:(1)it is difficult to extract local and global graph structure features simultaneously,with the result that the graph utility is severely distorted after the anonymization;(2)the ability to support utility preference is weak,and it is difficult to maintain the user-defined utility while protecting individual privacy;(3)most methods are limited to weak edge-difference privacy and have weak protection on the community structure which can map the common attributes between nodes.Focusing on the above problems,we introduce the generative adversarial network model,graph representation learning model,and Katz similarity model to study differentially private synthetic graph publishing methods that can achieve a trade-off between privacy and utility.The contributions and innovations lie in the following aspects:(1)Aiming at the problem that it is difficult for the existing research to preserve both local and global graph structures,generative adversarial network model is used as our algorithmic building block since it can model the local and global distribution of graphs.We propose PrivGAN,a differentially private graph publishing method based on generative adversarial network to preserve the local and global structural features without compromising individual privacy.For the problem that the global sensitivity of the learning model is difficult to calculate,the gradient clipping is used to limit the sensitivity,and the gradient estimation is used to obtain the upper bound of the weight gradient to achieve a reasonable setting of the clipping threshold.To address the over-splitting of the privacy budget caused by iterative optimization,the error of the objective function before and after adding noise is theoretically quantified,and a parameter control mechanism based on the quantization error is constructed to reduce the scale of noise added in the optimization process.Theoretical analysis and experimental results show that the synthetic graph generated by Priv GAN can effectively maintain degree distribution,Gini coefficient,and clustering coefficient while satisfying edge-differential privacy.(2)Aiming at the problem that it is difficult for the existing research to support utility preference,graph representation learning model is used as our algorithmic building block since it can convert the samples generated by user-defined sampling into low-dimensional vectors while preserving high graph utility.We propose Priv GRL,a differentially private graph publishing method based on graph representation learning,which supports user-defined utility preference settings.For different sampling strategies in graph representation learning,a provable unified optimal parameter setting method is designed to ensure the theoretical interpretability of the embedding matrix utility.Based on the optimal parameter setting,an objective perturbation method with the penalty term is designed to improve the utility of perturbed data while avoiding privacy budget splitting during iterative optimization.Theoretical analysis and experimental results show that the synthetic graph generated by Priv GRL can effectively maintain node common neighbors,degree distribution,and clustering coefficient while satisfying edge-differential privacy.(3)Aiming at the problem that the existing research has weak protection on the community structure which can map the common attributes between nodes,Katz similarity model is used as our algorithmic building block since it can maintain the global graph structure feature and this model’s internal parameters are easy to control.We propose No Priv Gra,a node differentially private graph publishing method to maximize the utility on the community structure while maintaining a higher level of privacy.Based on the decay factor of Katz,a sensitivity control strategy is designed to alleviate the problem of excessive sensitivity caused by node differential privacy.To address the over-splitting of the privacy budget caused by the Katz matrix,a matrix low-rank approximation method under differential privacy is designed to reduce the addition of noise scale.By analyzing the internal relationship between the eigenvalues and eigenvectors after adding noise,a dynamic privacy budget allocation mechanism is designed to improve the utility of the perturbed Katz matrix.Theoretical analysis and experimental results show that the synthetic graph generated by No Priv Gra can effectively maintain community structure and degree distribution while satisfying node-differential privacy. |