Font Size: a A A

Research On Differentially Private Frequent Pattern Mining Techniques

Posted on:2017-04-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Z XuFull Text:PDF
GTID:1318330518496004Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the development of computer and internet technologies, our capabilities for both generating and collecting data are substantially enhanced. Data mining can intelligently help us transform the vast amounts of data into useful knowledge and information. It can automatically and conveniently extract patterns which are implicitly stored or captured in large databases and represent knowledge. Now, data mining has practical importance in a number of areas, ranging from financial securities to product manufacturing. In data mining research, frequent pattern mining is a focused theme. Frequent pattern mining plays an essential role in mining associations, correlations, and many other interesting relationships among data. Moreover, it can also help in data classification, clustering, and other data mining tasks.However, lots of data mining tasks involve sensitive personal information (e.g., transcation records, behavior records and mobility traces). If we directly publish or share discovered frequent patterns, it might lead to serious privacy leakage. Differential privacy proposed in recent years provides a feasible way to address such problem. Unlike the anonymization-based privacy models, differential privacy offers a more robust privacy guarantee without making any assumptions about the adversary's prior knowledge.Recently, designing differentially private frequent pattern mining algorithms has become a hot research topic and draws more and more attention from both academic and industrial communities. How to improve the utility of the mining results while satisfying differential privacy is the major challenge for differentially private frequent pattern mining. Now, the research on differentially private frequent pattern mining is still at the preliminary stage. There is room for improvement in many aspects, such as the utility of the results, the efficiency of the algorithms, and so on.Therefore, in this paper, we study the differentially private frequent pattern mining problem for three main types of patterns, i.e., itemset, sequence and subgraph. Through in-depth study, we made the following major contributions:(1) For the problem of differentially private frequent itemset mining,we propose a novel differentially private frequent itemset mining algorithm PFP-growth. The PFP-growth algorithm consists of a preprocessing phase and a mining phase. In the preprocessing phase, to limit the length of transactions while preserving more information, we propose a smart splitting method. In the mining phase, to offset the information loss caused by transaction splitting, we devise a run-time estimation method. In addition, to make depth-first search algorithm satisfy differential privacy,we put forward a dynamic reduction method. Through formal privacy analysis, we show that our PFP-growth algorithm is ?-differentially private.Extensive experiments on real datasets illustrate that our PFP-growth algorithm is time-efficient and can achieve both good utility and good privacy.(2) For the problem of differentially private frequent sequence mining,we propose a novel differentially private frequent sequence mining algorithm PFS2. It can support the general gap-constrained frequent sequence mining in the context of differential privacy. We found, in differentially private frequent sequence mining, the amount of added noise is proportionate to the number of candidate sequences. In our PFS2 algorithm, we utilize sample databases to prune candidate sequences, such that the amount of added noise can be significantly reduced. In the sample databases, we use the noisy local support of candidate sequences to estimate which candidate sequences are potentially frequent. To improve the accuracy of such private estimations, a gap-aware sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to calibrate the amount of noise required by differential privacy, a gap-aware sensitivity computation method is proposed to obtain the sensitivity of the local support computations with different gap constraints. Furthermore, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is ?-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences under different gap constraints with high accuracy.(3) For the problem of differentially private frequent subgraph mining,we propose a novel differentially private frequent subgraph mining algorithm DFG. In this algorithm, we first privately identify frequent subgraphs from input graphs in order of increasing graph size, and then compute the noisy support of each identified frequent subgraph. In particular, to privately identify frequent subgraphs, we present a frequent subgraph identification approach which can improve the utility of frequent subgraph identifications through candidates pruning. Moreover, to compute the noisy support of each identified frequent subgraph, we devise a lattice-based noisy support derivation approach, where a series of methods has been proposed to improve the accuracy of the noisy supports.Through formal privacy analysis, we show that our DFG algorithm is ?-differentially private. Extensive experimental results on real datasets show that the DFG algorithm can find frequent subgraphs with high data utility while satisfying ?-differential privacy.
Keywords/Search Tags:frequent pattern mining, differential privacy, data mining, data privacy
PDF Full Text Request
Related items