Frequent itemset mining is a basic data mining task,and it also plays a key role in association rule algorithms.However,it is very likely that the user's personal information will be leaked during the mining process.In recent years,the application of differential privacy protection model to mining frequent itemsets is a relatively common and reliable protection method.Most of the papers are proposed for central differential privacy and few papers apply local differential privacy to mining frequent Itemsets.The advantage of local differential privacy is that the user first disturbs the original data on the client,and then sends the disturbed data to the aggregator,which can prevent the leakage of user data by aggregator,thereby improving the user data protection.At present,there is no complete framework to apply local differential privacy to frequent itemset mining tasks,and there are problems of high communication cost and low mining accuracy.In order to solve the above problems,this paper proposes corresponding solutions:(1)We propose a complete framework for applying local differential privacy to frequent itemset mining,and it is applicable to the case where user data types are more complex.The framework first maps the user's original data to a binary string using bitmap coding.Aiming at the multi-attribute of users,the Threshold Random Response(TRR)algorithm is proposed to achieve the best perturbation method.The user sends the disturbed data to a third-party aggregator.The aggregator performs unbiased estimation on the collected data,and uses FP-Tree for mining frequent itemsets.Finally,we verified the feasibility of the framework through experiments and the TRR algorithm is superior to the existing perturbation algorithms.The TRR algorithm results are more accurate under the same degree of disturbance.(2)We propose a Hadamard Response(HR)algorithm that satisfies local differential privacy and applies to mining frequent itemsets.HR algorithm uses the Hadamard matrix to map and store user data,appropriately increasing the range of possible values output by the user and sampling the mapped strings uniformly.If the user's threshold is 6),the communication cost is reduced from (6))to (7)2)6))and the accuracy of the result is also improved.Finally,we compare the HR algorithm with the previous frequent itemset mining perturbation algorithms through experiments.It was found that the HR algorithm has the lowest communication cost and high availability of results. |