Font Size: a A A

Crucial Patterns Mining With Differential Privacy Over Transactional Data Streams

Posted on:2020-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2428330596973759Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Frequent patterns mining over transactional data streams is a based task for a wide range of online data mining applications.Nevertheless,mining crucial patterns is even more appropriate than frequent patterns over transactional data streams,because crucial patterns are the subset of frequent patterns with the minimum storage cost and information lossless extraction.In this paper,we argue that the privacy of mining crucial patterns from data streams which aggregates information from individuals is more likely to be leaked than static scenarios,because the successive releases can be used as auxiliary knowledge to increase the inferential capability of attackers.However,to the best of our knowledge,there is little work on differential privacy in continuously publishing crucial patterns from data streams.To this end,this paper proposes a Real-time Differentially Private Crucial Cattern Computation Algorithm(RDP-CPC)which designs a three-phase mechanism(i.e.,the preprocessing phase,the deep-going calculation phase,and the noise-mining phase)at every timestamp.The algorithm is able to not only improve the utility of the crucial pattern statistics as much as possible which satisfy differential privacy,but also reduce the average mining time without incurring high maintenance cost according to the feature of crucial patterns.To reduce the number of calls to Crucial Pattern Computation Algorithm(CPC),we design two-dissimilarity formulas according to the relationship between frequent patterns and crucial patterns to decide to return either low noisy statistic or accurately approximated statistic in the first two phases.When the low noisy statistic needs to be turned,the algorithm goes into the noise-mining phase.To obtain private crucial patterns,we first filter crucial pattern candidate set by perturbing the scoring functions,and then add independent Laplace noise to their supports.Finally,we provide strict theoretical analysis and conduct extensive experiments on dense datasets and sparse datasets to show the effectiveness and efficiency of our algorithm.In this paper,the privacy problem of the crucial pattern mining over transaction data streams is studied.Through the detailed analysis of the relationship between crucial patterns and frequent patterns,a Real-time Differentially Private Crucial Pattern Computation Algorithm(RDP-CPC)based on sliding window is proposed,which used to address the more serious privacy leak in periodically publishing the crucial patterns of successional windows from data streams.The main research work of this paper is as follows:(1)We identify the privacy problem of mining crucial patterns over transactional data streams for the first time,and according to the scenario,it is explained that the privacy disclosure over the data streams is more serious than static scenarios,since the supplementary information of release at contiguous time instances.(2)Facing the challenge of existing methods,three key sub-algorithms are proposed to realize the the privacy protection of crucial patterns of transactional data streams.Firstly,a Build Prefix Tree Algorithm is proposed,in which the first dissimilarity calculation is added,which can effectively reduce the call of mining algorithm and improve the average mining efficiency.Next,considering the data utility and mining efficiency depending on the compactness of the prefix tree,a Prefix Tree Adjustment Algorithm is proposed.Furthermore,in the noise mining phase,a two-step sequential noise-adding method was designed to release the private crucial patterns with their noise support according to the release requirements of the crucial patterns.And in the first step to perturb the patterns,a Filter Crucial Pattern Candidate Set Algorithm is proposed based on the correlation between frequent pattern and crucial pattern,which calculates an integrated sensitivity to improve the utility.Finally,the privacy analysis and spatiotemporal complexity of each sub-algorithm are analyzed.(3)In order to allocate privacy budget to w successive timestamps in a sliding window reasonably,so that the total privacy budget of each sliding window does not exceed the total?,a tailored three-phase mechanism(i.e.,RDP-CPC Algorithm)to achieve private release at every timestamp is designed.Specifically,we design two-dissimilarity formulas by considering the relationship between frequent patterns and crucial patterns,which can reduce the redundant calls for CPC algorithm.Finally,the Real-time Differentially Private Crucial Patterns Computation Algorithm(RDP-CPC)is proved to satisfy?_i-differential privacy with rigorous mathematical formula and proves that the budget absorption strategy satisfies w-transaction differential privacy.(4)We conduct extensive experiments on three real sparse and dense datasets(Among the datasets,Chess and Accidents are dense datasets and Retail is a sparse dataset),as the absence of direct competitors in the literature,we compare our RDP-CPC algorithm with a Straightforward Approach(SA)by extending the existing state-of-art techniques in different scenarios,and the three evaluation metrics of F-score,relative error(RE)and runtime are selected to evaluate the utility of the pattern,the utility of the support,and the average running time,respectively,which are used to evaluate the performance of RDP-CPC algorithm and SA.The experimental results show that with the increase of privacy budget,the utility is higher.As the W gradually decreases,the utility is better in sparse dataset;as the number of timestamps contained in a window r increases,the average runtime is smaller.
Keywords/Search Tags:differential privacy, crucial patterns, transactional data streams, privacy leakage, data mining
PDF Full Text Request
Related items