Research On Frequency Estimation And Frequent Itemset Mining For Local Differential Privacy Protection

Posted on:2022-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:G Y Wang

Full Text:PDF

GTID:2518306557968379

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet,data mining,machine learning,federated learning and other technologies have been widely used.However,in the process of collecting,analyzing and releasing massive data,the sensitive information in the data may be accidentally leaked,and data security and data privacy are facing severe challenges.Differential privacy is a kind of privacy protection technology based on data distortion,and it has strict mathematical theory proof.Not needing to consider the background knowledge of attackers,differential privacy can quantify the level of privacy protection and provide users with strong privacy guarantees.Local differential privacy is a new differential privacy protection model.It transfers the process of data privacy to the user side,which not only provides privacy protection for users but also ensures the information security of the server.Frequent itemset mining is a key technology in data mining,aiming at finding frequent itemsets in transaction data sets.Data mining algorithms with local differential privacy need to take account of security,result utility,communication cost and computing efficiency.This thesis focuses on the frequent itemset mining algorithms with local differential privacy protection,and the research content mainly includes the following three aspects:This thesis analyzes the existing frequency estimation algorithms with local differential privacy and finds that the accuracy of the frequency estimation algorithms with local differential privacy is closely related to data dimension,and the algorithms have the disadvantage of high communication cost.However,due to the limited network bandwidth and user device storage space,frequency estimation algorithms with local differential privacy should maintain a reasonable communication cost.This thesis designs two adaptive frequency estimation algorithms based on the balance of calculation accuracy and communication cost.The adaptive frequency estimation algorithms can reduce communication cost by applying random projection matrix and hadamard matrix,perform random response on low-dimensional data,ensure result utility and be applied to a wider range of scenarios.In order to solve the problem of the low efficiency of frequency estimation algorithms with local differential privacy,this thesis designs two algorithms.Random response is the mainstream perturbation mechanism for local differential privacy.The traditional random response method needs to traverse each coordinate of the vector and generate probability parameters according to the current coordinate value to perform probability flipping.This method has low efficiency.The first algorithm generates multiple random numbers at one time by setting tag variables and performs overall operations on user code vectors,which significantly accelerates the execution speed of the program.The data set involved in frequency estimation with local differential privacy is usually very large,and it is very time-consuming to traverse the entire data set.The second algorithm uses the central limit theorem and standard normal distribution to calculate a reasonable sampling number and samples the original data set.While improving the efficiency of program execution,this algorithm can still ensure a reasonable accuracy of frequency estimation.Frequent itemset mining with local differential privacy needs to estimate the frequency of each itemset in the candidate set,and the itemset whose estimated frequency is higher than the specified threshold is the frequent itemset.The frequent itemset mining process needs to involve multiple sub-query tasks.Therefore,users and the server need to communicate multiple times.In order to make the entire algorithm meet differential privacy,user grouping or privacy budget allocation can be considered.This thesis theoretically proves that when the probability distribution of the user group is consistent with the overall probability distribution,the mean squared error of the frequency estimation algorithm GRR and OUE when users are grouped evenly is smaller than the mean squared error when the privacy budget is allocated evenly.Experimental results show that when the number of users is insufficient or the probability distribution of user groups is not consistent with the overall probability distribution,the privacy budget even allocation is also an effective frequent itemset mining algorithm.This thesis conducts experiments on the synthetic data sets and the real world data sets,analyzes the result utility and calculation efficiency of the algorithms by using such indicators as the mean squared error,normalized cumulative rank and operation time,and verifies the superiority of the designed algorithms.

Keywords/Search Tags:

differential privacy, privacy protection, random response, data mining, frequent itemset mining

PDF Full Text Request

Related items

1	Research On Frequent Itemset Mining Based On Differentially Private Model
2	Research On Frequent Itemset Mining Of Complex Data Based On Local Differential Privacy
3	Study On The Frequent Itemset Mining Based On Differential Privacy
4	Research On Frequent Pattern Mining Algorithm Under Local Differential Privacy
5	Research On Frequent Itemset Mining Method With Differential Privacy Based On Transaction Truncation
6	Research On Privacy Protection Algorithm For Association Rules Mining
7	Research On Privacy Preserving Association Rules Mining Algorithm
8	Frequent Sequence Mining For Local Difference Privacy
9	Research On Frequent Itemsets Mining Algorithms Based On Differential Privacy Protection
10	Statistical Data Analyses Based On Local Differential Privacy