| Frequent subgraph mining is a specific form of frequent pattern mining,which is widely used in social network analysis,biotechnology,recommender systems and other fields.However,graph datasets may contain some sensitive information,which may cause privacy leakage during the mining process or when publishing frequent subgraph information.Differential privacy does not depend on the background knowledge held by a third party,has a strict theoretical definition and quantifiable privacy protection means,and controls the level of privacy protection by adjusting the size of the privacy budget,which can be applied in the field of data mining.Differential privacy technology can ensure that the query results will not be significantly affected by the change of any record in the data set.At present,there are not many frequent subgraph mining algorithms that can realize differential privacy protection,and it is difficult for these algorithms to take into account high availability and security at the same time.The availability is greatly reduced;or,the privacy protection is not strong enough to achieve differential privacy.Based on the above problems,this thesis proposes two new top-k frequent subgraph mining algorithms that satisfy differential privacy.Aiming at the problem of low availability of mining results of existing top-k frequent subgraph mining algorithms that satisfy differential privacy,this paper proposes a DP-TGM(Differential Private Top-k subGraph Mining)algorithm.The algorithm firstly prunes the dataset according to the frequent points and edges mined,and then expands and mines the frequent edges in turn to obtain the final top-k frequent subgraph.The algorithm uses a priority queue to store the top k most frequent subgraphs temporarily mined,continuously updates the elements in the queue in the process of extended mining,and always updates the threshold to the minimum noise support in the queue,reducing the number of graphs.expansion times.The algorithm uses the Laplacian mechanism to add noise to the true support of subgraphs in three different stages,and uses the equal division method and the special series method to reasonably allocate the privacy budget to improve data availability.Aiming at the problem that the existing differential privacy-based frequent subgraph mining algorithms do not pay attention to the protection of weights when mining weighted graphs,this paper proposes the DP-FWSM(Differential Private Top-k subGraph Mining)algorithm.Before sub-graph mining,the frequent edges are filtered out,and the weights are noised to ensure the safety of the weights.The original data set is modified according to the frequent edges.In the process of sub-graph mining,the index mechanism is used to select sub-graphs,and at the same time,the Laplacian mechanism is used to disturb the support count when publishing data,which realizes the double protection of weight and support.In this thesis,it is proved theoretically that both algorithms satisfy differential privacy protection.At the same time,comparative experiments are used to verify the availability of the algorithms on data sets of different scales. |