With the continuous upgrading of the entire Internet industry,new smart devices and software continue to develop,allowing enterprises to accumulate a large amount of application data.In data mining,many data can be abstracted into graph data,and graphs(nets)are used to express real relationships,which have the characteristics of directness and naturalness.Graph databases can efficiently insert and query related data,which is convenient for subsequent modeling work.Enterprises need to share these data or hand over the data to a third-party data analysis company to analyze the data and mine the potential information characteristics of the data,so as to facilitate the enterprise to develop products better.However,the direct use of these data will lead to the leakage of sensitive information that users are unwilling to disclose in the network graph,which will cause losses to both enterprises and users.Therefore,certain privacy protection needs to be carried out before publishing.Therefore,the protection of user sensitive information in data sharing and mining has gradually been paid more and more attention by people,and has become a hot issue of general concern from all walks of life.Differential privacy is a model that can resist the attack of background knowledge and has an ideal privacy protection effect.It is gradually used in the field of graph data publishing.This paper will focus on the security issues of the statistical characteristics of network graph data in the process of publishing,aiming at the problem of serious loss of project algorithm graph structure information and excessive publishing data errors in triangular counting publishing,as well as the need to measure the degree of each moment in dynamic graph publishing.The histogram allocates privacy budget and accumulates too much noise in the histogram release.The differential privacy model is used to protect the privacy of users in the network graph.The main research contents are as follows:1.Research the node triangular count publishing problem under the constraint of differential privacy of points in the network graph,aiming at the problem of high sensitivity of publishing algorithm and large changes in the global structure of the graph projection process.A graph projection algorithm based on the global structural features of graphs is proposed.This method refers to the close centrality and indirect centrality of nodes.In the process of graph projection,the structural information of the original graph is preserved,and the global sensitivity is reduced.More triangles;in the histogram release stage,a histogram release mechanism based on k-means clustering is further proposed,using the idea of histogram grouping in statistics to determine k in k-means clustering,and in In the study,the initial cluster center was selected to avoid the sensitivity of k-means to the initial cluster center as much as possible.According to theoretical analysis and simulation experiments,the current designed graph projection can retain more triangles.When the privacy protection intensity of histogram publishing algorithm is high,L1 error and KS error are small,which improves the data usability on the basis of meeting the definition of differential privacy.2.Study the continuous publishing of node degrees under the constraints of point differential privacy,and design a dynamic graph data publishing scheme based on point differential privacy.The algorithm first abstracts the dynamic graph data into incremental graph data.Aiming at the problem that the incremental graph data continues to increase with time,an incremental-projection algorithm is proposed to project the incremental graph sequence and reduce the sensitivity of histogram release.,and the compressed graph sequence obtained by the incremental-projection algorithm is still an incremental sequence,which avoids the allocation of privacy budgets in the dynamic publishing process;in the histogram publishing stage,the difference sequence of the histogram sequence is grouped by hierarchical clustering,effectively reducing the accumulated noise.Through theoretical analysis and simulation experiments,it is concluded that the algorithm studied in this paper can keep more edges in the process of graph projection,reduce the error between the published histogram and the original histogram under the same privacy protection intensity,and improve the data usability on the basis of meeting the definition of differential privacy. |