Font Size: a A A

Differentially Private Frequent Episodes Mining Over Event Streams

Posted on:2021-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:J W QinFull Text:PDF
GTID:2428330629453115Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of big data technology,real-time data stream monitoring has attracted a lot of attention in many fields,such as infrastructure network monitoring,disease monitoring,etc.Cluster,classification,pattern mining and other issues over stream have arisen more and more interest of the researchers.However,collecting and sharing data in these applications often lead to the disclosure of personal sensitive information.How to effectively release data while protecting personal sensitive information from being leaked over stream is a problem that needs to be solved.Differential privacy is a rigorous and provable privacy model.Earlier research on differential privacy has paid attention to “one-shot” release on the static database.Since the stream has the characteristic s of celerity,change,infinity and continuity,it brings some challenges of the real-time analysis of the data.Frequent episode mining is a framework to mine useful information from event stream.The aim of frequent episode mining is to find all frequent episodes that supports are not less than the user-specified threshold.Based on differential privacy,most existing research under continuous monitoring has processed simple counting and analysis tasks.Researches for complex tasks not only have few solutions but also need to further improve the utility of the release.In this paper,a real-time differentially private frequent episode mining approach Re-DPFE is proposed based on sliding window with w-event guarantee to resolve the above problem.The main work of this paper is as follows:(1)This paper summarizes the methods of frequent episode mining,privately frequent sequence mining in static datasets and privately publishing over data streams.It points out that the existing techniques would not be applied to privately mining frequent episodes in sliding window,which have low utilization of privacy budget and efficiency.It also discusses the new privacy leakage problem of mining frequent episodes over event streams and provides a new privacy protection scheme.(2)For the sake of allocating the privacy budget reasonably in each sliding window,we propose an adaptive w-event mechanism composed of newly adaptive sampling method and privacy budget allocation method at each timestamp.It can adjust sampling intervals on the basis of history data and adaptively allocate privacy budget to each sampling points in the sliding window,so that the total allocated privacy budget is not exceed ?.(3)In order to realize privately frequent episode mining in each sliding window over event streams,a sample-based perturbation mechanism is proposed,which utilizes the idea of sampling and selects more accurately from the candidate episode sets based on the sample datasets.And then laplace noise is added to the support of each frequent episode.To solve the problem of low efficiency,the non-sampled timestamps are released by the last release.Moreover,we present an incremental mechanism approach based on dissimilarity calculation formula.It allocates the entire privacy budget to the new discovered minimal episode occurrence database instead of the sliding window,while the coincident window is approximately released according to the last release.Finally,we analyze the privacy of sub-algorithms and the framework Re-DPFE.Meanwhile,the time complexity is calculated.(4)Three real-world datasets(BMS,Retail and Kosarak)are used in our experiments.Since the absence of direct competitors in the studies,we design a straightforward method,denoted as BS,to let frequent episodes mining over event streams satisfy ?-differential privacy by means of extending state-of-art techniques in different scenarios.The F-score,Relative Error and runtime are chosen to evaluate the utility and efficiency.Re-DPFE and BS are compared by changing the sliding window size w,the privacy budget parameter ? and the maximum episode occurrence window size ?,respectively.Extensive experiments demonstrate that our proposed algorithm is effective and efficient.
Keywords/Search Tags:Differential privacy, frequent episode, event streams, privacy preservation, data mining
PDF Full Text Request
Related items