Research Of Frequent Itemsets Mining Algorithm With Differential Privacy For Large-scale Data

Posted on:2019-10-20

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Xiong

Full Text:PDF

GTID:2428330566961590

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the explosive growth of data and the rapid development of information technology,various industries have accumulated large amounts of data through various channels.To discover useful knowledge from large amounts of data for upper-layer applications(e.g.business decisions,potential customer analysis,etc.),data mining has been developed rapidly.However,the large amount of data also contains privacy sensitive information,which may be leaked if not well managed.People and organizations are increasingly wary of data protection.In the long run,without the support of data privacy,the mining tasks cannot proceed smoothly.Thus mining knowledge under confident privacy guarantees is the trend.In this thesis,we focus on the privacy-preserving frequent itemsets mining as a case study.We first carefully analyze the current status of privacy protection research for frequent itemsets mining.Then,we review some representative privacy preserving frequent itemsets mining algorithms in the traditional anonymous model and the newer,modern differential privacy model.We also discuss their advantages and disadvantages as well as their performances.It turn out that for large-scale data current approaches either cost large computation or cannot provide a mathematically strict privacy guarantee.To address this,we propose an improved privacy-preserving frequent itemsets mining algorithm for large-scale dataset in the modern differential privacy model.Our algorithm is based on the following ideas.First,we build our algorithm based on FP-Tree for frequent itemsets mining.In order to solve the problem of building FP-Tree with large-scale data,we use the central limit theorem to make reasonable sampling;later,we use the sampled data to mine closed frequent itemsets,which are used to find the potential items in the large-scale dataset.Second,we employ a length constraint strategy to embed randomness and privacy in the potential frequent itemsets,which also solve the problem of high global sensitivity for differential privacy guarantee.Specifically,it adopts the string matching idea to discover the most similar string in the source dataset and implements transaction truncation for achieving the lowest information loss.Thus,it improves data utility.At last,we use the Laplace mechanism to add noise for the final minded frequent itemsets to ensure more privacy guarantee.We implemented our algorithm using C++ and conduced performance evaluation.In order to verify the effectiveness of the algorithm,we employed multiple public datasets and compared our algorithm with the state-of-the-art algorithms.Experimental results show that our algorithm obtains relatively better performance in terms of F-score and relative error.

Keywords/Search Tags:

Frequent Itemsets Mining, Differential Privacy, Privacy preserving, Sampling, Transaction Truncation

PDF Full Text Request

Related items

1	Research On Frequent Itemsets Mining Algorithm Under Differential Privacy
2	Research On Key Technologies Of Privacy Preserving Data Mining Based On Local Differential Privacy
3	Research On Frequent Itemset Mining Method With Differential Privacy Based On Transaction Truncation
4	Research Of Frequent Itemsets Mining That Supports Differential Privacy
5	Research On Algorithm For Privacy Preserving Based On Association Rule Mining
6	Frequent Itemsets Mining For Uncertain Data Based On Differential Privacy
7	Research On Frequent Itemsets Mining Algorithms Based On Differential Privacy Protection
8	Research On Sharing And Publishing Technology Of Frequent Itemsets Based On Differential Privacy
9	Study On The Frequent Itemset Mining Based On Differential Privacy
10	Privacy Protection Of Data Mining