Font Size: a A A

Research Of Frequent Itemsets Mining Algorithm With Differential Privacy For Large-scale Data

Posted on:2019-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:X Y XiongFull Text:PDF
GTID:2428330566961590Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the explosive growth of data and the rapid development of information technology,various industries have accumulated large amounts of data through various channels.To discover useful knowledge from large amounts of data for upper-layer applications(e.g.business decisions,potential customer analysis,etc.),data mining has been developed rapidly.However,the large amount of data also contains privacy sensitive information,which may be leaked if not well managed.People and organizations are increasingly wary of data protection.In the long run,without the support of data privacy,the mining tasks cannot proceed smoothly.Thus mining knowledge under confident privacy guarantees is the trend.In this thesis,we focus on the privacy-preserving frequent itemsets mining as a case study.We first carefully analyze the current status of privacy protection research for frequent itemsets mining.Then,we review some representative privacy preserving frequent itemsets mining algorithms in the traditional anonymous model and the newer,modern differential privacy model.We also discuss their advantages and disadvantages as well as their performances.It turn out that for large-scale data current approaches either cost large computation or cannot provide a mathematically strict privacy guarantee.To address this,we propose an improved privacy-preserving frequent itemsets mining algorithm for large-scale dataset in the modern differential privacy model.Our algorithm is based on the following ideas.First,we build our algorithm based on FP-Tree for frequent itemsets mining.In order to solve the problem of building FP-Tree with large-scale data,we use the central limit theorem to make reasonable sampling;later,we use the sampled data to mine closed frequent itemsets,which are used to find the potential items in the large-scale dataset.Second,we employ a length constraint strategy to embed randomness and privacy in the potential frequent itemsets,which also solve the problem of high global sensitivity for differential privacy guarantee.Specifically,it adopts the string matching idea to discover the most similar string in the source dataset and implements transaction truncation for achieving the lowest information loss.Thus,it improves data utility.At last,we use the Laplace mechanism to add noise for the final minded frequent itemsets to ensure more privacy guarantee.We implemented our algorithm using C++ and conduced performance evaluation.In order to verify the effectiveness of the algorithm,we employed multiple public datasets and compared our algorithm with the state-of-the-art algorithms.Experimental results show that our algorithm obtains relatively better performance in terms of F-score and relative error.
Keywords/Search Tags:Frequent Itemsets Mining, Differential Privacy, Privacy preserving, Sampling, Transaction Truncation
PDF Full Text Request
Related items