Font Size: a A A

Research On Differential Privacy Protection Methods For Social Network Data Publishing

Posted on:2020-04-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:1368330605479513Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to the rapid development of social networks,the research on social networks is increasing day by day.People can understand many social phenomena through social networks,such as spread diseases,emotional infection,occupational mobility,etc.Large amounts of data are produced on social networks every day,most of these data are closely related to personal life.A variety of social network analysis and data mining methods are used in order to reveal the knowledge and value behind them.If these data are maliciously used by intruders,users' privacy in social networks will be greatly threatened.Nowadays,more and more attention has been paid to such privacy and security issues.Aiming at the privacy and security problems in social network data release,this paper conducts an intensive study and proposes specific differential privacy protection methods to provide privacy guarantee for social network analysis and data mining.The ?-differential privacy(?-DP for short),is a definition of privacy proposed for the privacy disclosure of statistical databases.Differential privacy defines an extremely strict attack model without considering any possible background knowledge that an attacker possesses.We attempt to apply differential privacy to social network privacy protection,analyze problems from the perspective of offline data and release result data in the non-interactive way.The research will be carried out depending on different data release scenarios,namely the actual release requirements and characteristics of the release data,as well as some different concerns about privacy issues.The release scenarios one and two are the issues of privacy disclosure for edge weights,release scenarios three and four are the issues of privacy disclosure for edge information.The specific research work is as follows:Firstly,focusing on the issues of privacy disclosure when calculating the predictive ratings in recommender systems,we propose an approach combining two perturbation methods for releasing predictive rating results.In release scenario one,the user-item preference mapping is a bipartite graph,in which edge weight represents the user's personal rating data.The rating content is associated with sensitive information,which needs to be protected from disclosure during releasing the results.Based on differential privacy,the method performs collaborative filtering to conceal ratings and provide valuable predictions.It employs the existing algorithms to calculate the predicted ratings,the calculation of similarity is based on user-user similarity,that is,some kind of behavior relationship between users.The DPI(DP Input)method perturbs the original ratings,which can be followed by any recommendation algorithms to make direct predictions.Based on the original ratings,the DPM(DP Manner)method perturbs various measurements during the implementation of the algorithms and provides predictive ratings.Finally,the experimental results on the real datasets show that both methods can provide valuable prediction results while guaranteeing differential privacy,which enable the predictive recommendation tasks under privacy protection.Secondly,the disturbance strategy MB-CI(Merging Barrels and Consistency Inference)is proposed to solve the problem of protecting edge weights in weighted social networks.In release scenario two,we assume that network topology is known information to the public,only edge weights are associated with sensitive information,which may represent the communication frequency,commodity price,relationship of intimacy,etc.By viewing the edge-weight sequence as an unattributed histogram,differential privacy for edge weights can be implemented based on the histogram.Considering that some edges possess the same weight in social networks,we merge the barrels with the same count into one group to reduce the noise required.Moreover,the definition of K-indistinguishability between groups is proposed to fulfill differential privacy not to be violated,because simply merging operation may disclose some information by the magnitude of noise itself.For keeping most of the shortest paths unchanged,consistency inference is performed according to original order of the sequence as an important post-processing step.Finally,the experimental results on the synthetic and real datasets show that the proposed approach can effectively improve the accuracy and utility of the released data under the premise of differential privacy.Then,aiming at the issues of privacy disclosure when publishing aggregate information in network statistics,we propose an approach based on edge-differential privacy to release the distribution of clustering coefficients accross communities,which can provide more information about behaviors between groups,or patterns between clusters in social networks.In release scenario three,the edge between any two nodes may represent the friendship,cooperation,trading relationship,etc.The existence of an edge in networks is considered sensitive information.The approach includes two algorithms,DPLM(DP Louvain Method)and DPCC(DP Clustering Coefficient),respectively,to partition communities and release a histogram under privacy protection.DPLM algorithm improves LM(Louvain method)using an exponential mechanism.Due to the introduction of randomness,the concept of absolute gain is proposed to replace the relative gain in the original algorithm.In the phase of modularity optimization,we sanitize neighborhood communities for every node,that is,the effective communities are retained which will be randomly selected for the node to move in.DPCC algorithm outputs the noisy distribution of clustering coefficients as a histogram,which presents the results in an intuitive manner.Finally,the experimental results on the real datasets show that the proposed approach can provide valuable distribution results while guaranteeing edge-differential privacy.As an improvement of community detection method,DPLM algorithm can achieve better modularity for the networks.Finally,to publish social graphs under privacy protection for reproducing valuable results of scientific researches,we propose an improved approach r Tb I based on w PINQ to perform graph reconstruction calculation.In release scenario four,the existence of an edge between any two nodes in graphs is considered sensitive information.Based on edgedifferential privacy,this method disturbes the structure of the released graph and ensures that the edge information is not leaked.The original workflow starts with a seed graph,which essentially is the 1K-graph fitting the noisy degree sequence.In view of the inaccurate assortativity coefficient,we truncate the workflow to replace the seed graph with an optimal one by doing target 1K-rewiring,in which the target is set as assortativity coefficient while preserving the 1K-distribution.Subsequently,MCMC process employs the new seed graph as the initial state,and proceeds step by step guided by the information of Tb I(Triangles by Intersect)query to increase the number of triangles in the synthetic graphs.Finally,the experimental results on the real datasets show that the proposed approach can better maintain data availability of the published social graphs under the premise of edge-differential privacy.
Keywords/Search Tags:Social Networks, Differential Privacy, Collaborative Filtering, Edge Weights, Community Detection, Graph Model
PDF Full Text Request
Related items