Font Size: a A A

Research On Frequent Itemset Mining Based On Differentially Private Model

Posted on:2020-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y B WangFull Text:PDF
GTID:2428330602957969Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,data mining is widely used in social networks,medical institutions,education systems,etc.Frequent pattern mining is an important research direction in the field of data mining,which lays a foundation for the analysis of association rules,classification and clustering.Frequent itemset mining,as one of its specific forms,is usefull for recommendation systems,personalized websites and customer shopping habit analysis.However,frequent itemsets,containing sensitive data,will risk to personal privacy at directed publishing.Differential privacy is a robust privacy protection model that is used in many areas due to its inherent advantages.This paper studies the frequent itemset mining under the differential privacy by weighing the privacy and availability of the output result.Therefore,our paper proposes a frequent itemset mining algorithm under differential privacy mechanism called HPU(High Privacy-Utility algorithm),HPU strives to ensure high availability and strong privacy of result.So we divide the process of paper into three phases:the original dataset preprocessing phase,the maximum frequent itemset mining phase and the result disturbancing and consistency processing phase.In the original dataset preprocessing phase,we perform lossless compression on the dataset,then we truncate the length of data record to reduce the sensitivity and noise.In the maximum frequent itemset mining phase,we based on the tree structure to find the maximum frequent itemsets,The purpose is reduce unnecessary privacy budgets;In the result disturbancing and consistency processing phase,we use the laplace noise to the result and propose an consistent algorithm to ensure the noisy result remains as identical as possible to the real result.This could improve the accuracy of the result.Then we use theoretical analysis to prove the HPU satisfies the ε-differential privacy protection.The final experiment verified the performance of HPU is better than the TF algorithm and the PrivBasis algorithm.
Keywords/Search Tags:Data Mining, Differential Privacy, Frequent Itemset Mining, Privacy Protection
PDF Full Text Request
Related items