Font Size: a A A

Research On Co-Occurrence Patterns Mining With Differential Privacy Across Multiple Streams

Posted on:2021-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:S J FangFull Text:PDF
GTID:2428330629453139Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Frequent pattern mining on data streams is a research hotspot in the field of data mining.However,many methods are investigated in data single streams,in which each transaction is independent and it is not considered that some transactions are generated by the same individual.There are many applications involve multiple streams in our daily life,and each user is corresponding to a stream.Interesting observations are often the objects that have recently appearing in many streams,such as emerging topic discovery,online shopping analysis,web usage pattern mining and location-based services,etc.Researching and sharing of multiple streams can greatly promote the development of society,because these data usually contain current hotspots.Government can use these statistics results for people's livelihood regulation,economic planning,and scientific research.Commercial companies can use this information for commercial promotion and commercial development.However,these data often contain personal information,and personal privacy will be revealed if the statistics are released directly without any processing.Differential privacy is an effective technique for protecting privacy.It has a strict mathematical definition and does not need to assume the background knowledge of the attacker,which has been widely used in various data publication scenarios.Frequent pattern mining with differential privacy is a research hotspot in the field of data mining and data security.However,the existing differential privacy methods about frequent patterns are mainly on static scenarios,and differential privacy methods about data streams are mainly for numerical or categorical data.No research has considered the privacy leakage caused by the mining of co-occurrence patterns on multiple streams.This thesis analyzes the privacy leakage caused by the co-occurrence patterns of a single window and continuous window in multiple streams.In order to address these problems,co-occurrence patterns mining with differential privacy across multiple streams is proposed.The mainly research work of this thesis is as follows:1)We summarize and analyze the frequent pattern mining on data streams,frequent pattern mining with differential privacy on static data and data publishing with differential privacy on data streams.Also we discuss that the existing work cannot be directly applied to co-occurrence patterns mining across multiple streams and point out the different of our methods and existing methods.2)We discuss the privacy leakage of top-k closed co-occurrence patterns publishing in a single window and continuous windows on multiple streams,and point out that the closed co-occurrence patterns publishing in continuous timestamps can enhance the attackers' reasoning ability.Therefore,it is possible to infer the users' privacy only with a little background knowledge,and the privacy is more easily leaked in continuous window publishing.3)The differentially privacy top-k closed co-occurrence patterns mining algorithm(DP-TCPM)is proposed.This algorithm includes a difference calculation phase and a differential privacy mining phase.The difference calculation phase compares the last noise-added closed co-occurrence patterns with the current actual statistics to be released,and judges whether enter or not to the differential privacy mining phase according to the comparison result.The differential privacy mining phase includes four parts: adjusting the co-occurrence pattern graph through transaction truncation,using the exponential mechanism to disturb the co-occurrence pattern graph,top-k closed co-occurrence pattern mining,and adding noise to the support of patterns.At the same time,we analyze the time complexity of the algorithm and prove that the algorithm satisfies differential privacy.4)Extensive testing is performed on three real data sets(OnlineRetail,BMSWebView2 and Foodmart).Due to the lack of directly comparison methods,a differential privacy method based on fully perturbation for edges in CP-Graph(FPCG)is proposed,which completely disturbs the co-occurrence pattern graph.F-score,average relative error and running time are selected to evaluate our algorithm.The experimental results show that our method has good utility and effectiveness.
Keywords/Search Tags:Differential privacy, Multiple streams, Co-Occurrence patterns, Data mining, Sliding window
PDF Full Text Request
Related items