Font Size: a A A

Research On Frequent Subgraph Mining With Differentially Privacy In Multigraph

Posted on:2021-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhengFull Text:PDF
GTID:2428330629453126Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Nowadays,mining frequent patterns from a single multigraph has become a research hotspot,such as social networks.Two people may have multiple relationships such as Facebook,Twitter,LinkedIn,etc.Frequent subgraphs in the graph are crucial to discovering the mechanisms of social interaction.However,if the data contain sensitive information,directly publishing or sharing the mined results will make threat to the privacy of the users in the data.Therefore,proper privacy protection is required before publishing this information.In the existing differential privacy protection of frequent subgraph mining,all operations are performed on single edge graphs,and privacy protection issues involving frequent subgraph mining on multigraphs have not been studied.Therefore,this dissertation propose a differential privacy protection algorithm DPFS-LM(Differentially Private Frequent Subgraph Mining in a Single Large Multigraph)for frequent subgraph mining of a single multigraph,the algorithm designs a two-stage mechanism according to the characteristics of multigraph: the phase of noise-mining frequent seeds and mine frequent subgraphs of subsequent sizes with differential privacy.In order to improve utility,the algorithm applies the idea of smart truncation in the frequent seed stage of noise mining to truncate the multiedge that exceeds the length limit,thus reducing the amount of noise.In the phase of mine frequent subgraphs of subsequent sizes with differential privacy,we combine exponential and Laplace mechanisms to privately discover the subsequent size of frequent subgraphs.Finally,this dissertation provide strict theoretical analysis and conduct experiments on single edge graph and multigraph datasets to show the effectiveness and efficiency of DPFS-LM algorithm.This dissertation focuses on the privacy issues of frequent subgraph mining methods on a single multigraph data.Through a detailed analysis of the complex structure of multigraph data and the privacy issues generated by frequent subgraph mining,a novel method of the differential privacy protection algorithm for miningfrequent subgraphs in the multigraph is used to solve the privacy leakage problem caused by mining and publishing frequent subgraph patterns in multigraph data with richer information.The main research work of this dissertation is as follows:(1)The current researches on frequent subgraph mining and privacy protection methods are summarized and analyzed,respectively.The limitations of the existing differential privacy protection methods in frequent subgraph mining are pointed out,indicating that the existing work cannot be directly applied to the scenarios in multigraph data privacy protection.(2)The privacy problem of mining frequent subgraphs on multigraphs is discussed for the first time.The problem of privacy leakage caused by inference attacks on multigraph data is analyzed in detail,which illustrates that mining frequent patterns in multigraph data will reveal more privacy than single edge graph data scenarios.(3)A differential privacy protection algorithm DPFS-LM(Differentially Private Frequent Subgraph Mining in a Single Large Multigraph)is proposed for frequent subgraph mining of a single multigraph.The algorithm mainly includes two phases.The first phase is noise-mining frequent seeds,which mainly processes the multiedges according to the data characteristics of the multigraph,i.e.,the frequent subgraphs of size 1 are privacy-mined.The second phase is to mine frequent subgraphs of subsequent sizes with differential privacy.First,the frequent subgraph patterns are selected by using the exponential mechanism.Then we use the Laplace mechanism to perturb the count value of the selected frequent subgraphs.Also,we carry out a detailed privacy analysis of the proposed method at each stage,and prove with strict mathematical formulas that the DPFS-LM algorithm satisfies ?-differential privacy.(4)The pseudo-code description of corresponding algorithm is given,and the experiments are performed on two real data sets Citeseer and DBLP-Multigraph.We choose three utility evaluation criterion including the relative error(RE),F-score and running time to evaluate the performance of the algorithm.The experimental results demonstrate that the frequent subgraph mining with differential privacy in the multigraph can protect the privacy while ensuring data utility.
Keywords/Search Tags:differential privacy, frequent subgraphs, privacy protection, multigraph, graph mining
PDF Full Text Request
Related items