Font Size: a A A

Research On Frequent Itemset Mining Method With Differential Privacy Based On Transaction Truncation

Posted on:2019-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuangFull Text:PDF
GTID:2428330590965732Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,cloud computing and big data,it is a research hotspot how to obtain potentially valuable knowledge by data mining currently.Among which,frequent itemset mining is the foundation of information services,such as discovering the correlation among transactions,providing recommendations and forecasting services,there are wide application requirements in e-commerce,medical health and location services and other fields.However,when the mining objects contain personal sensitive information,there are risks which leak personal sensitive information by mining and publishing frequent itemsets and their supports directly,it will restrict the application of frequent itemset mining.Therefore,it is necessary to protect the process of mining and publishing frequent itemsets.Differential privacy adopts the data distortion technology,which enables privacy protection by adding noise to the query request or release result.Although the model improves the level of privacy protection,it has a certain effect on the availability and accuracy of the output.Frequent itemset mining based on differential privacy has achieved a series of results,but there are still two major problems,how to ensure that the method satisfies ?-differential privacy,and how to improve the availability and accuracy of frequent itemsets and their noisy supports.Aiming at these problems,a frequent itemset mining algorithm of differential privacy based on transaction truncation and an adjustment algorithm of noisy supports based on linear regression are proposed in this paper.Since the transaction length affects the size of noise added in the transaction dataset,the first algorithm makes use of exponential mechanism's privacy and search capability.In order to minimize the sum of noisy errors and truncation errors,the selection problem of the optimal transaction length is transformed into the problem of designing quality function to get the optimal solution,then the original transaction dataset is truncated by the optimal transaction length.Frequent itemsets are mined from truncated transaction datasets by the algorithm FP-Growth,then the noise generated by Laplace mechanism is added to frequent itemsets and their supports,the algorithm improves privacy protection of mining results.The second algorithm adjusts the noisy supports sequence of top-k frequent itemsets,so that they satisfy the descending and integer properties like the real supports sequence,it further improves the accuracy of the frequent itemsets.Experimental results show that these algorithms improve the availability and accuracy of top-k frequent itemsets and their noisy supports.Meanwhile,the theory analyzes that the method satisfies ?-differential privacy and requirements of (?,?)-useful.
Keywords/Search Tags:frequent itemset mining, differential privacy, exponential mechanism, Laplace mechanism, linear regression
PDF Full Text Request
Related items