Font Size: a A A

Research On Frequent Itemset Mining Based On Local Differential Privacy

Posted on:2024-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:S HuangFull Text:PDF
GTID:2568307133950549Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of new-generation information technologies such as the Internet of Things and sensor networks,the scale of data has grown explosively,and mankind has entered the era of big data.In order to analyze and mine the information contained in data,more and more researchers and developers have focused their attention on data analysis and mining.As a fundamental issue in the field of data analysis and mining,frequent itemset mining,has become a research hotspot in many fields.However,data often contains sensitive private information of individuals,so it is necessary to protect the private information during frequent itemset mining.In addition,many countries around the world have successively promulgated a number of laws and regulations that explicitly regulate the use of data information,making it urgent to privately mine frequent itemsets.Local differential privacy recently became the de facto standard to preserve the privacy in analyzing sensitive data.It can guarantee strong privacy and provide quantifiable privacy protection level as traditional differential privacy,but does not rely on the assumption of a trusted third party,which has received widespread attention.The existing work on locally private frequent itemset mining has two main problems.Firstly,most of these work focuses on the mining of frequent single items,which cannot be directly applied to frequent itemset mining;Secondly,the several solutions for frequent itemsets use a strategy that combines sampling with existing unary frequency estimation protocols.It ignores the association between user multidimensional data and only retains a small amount of original information,making the mining results less available.Therefore,how to design the fundmental mechanism for multi-dimensional classification data and further improve the availability of mining results is an important challenge for privately frequent itemsets mining.This thesis studies frequent itemset mining based on the local differential privacy model,proposes a joint distribution estimation mechanism suitable for multidimensional classification data,realizes locally private frequent itemset mining,and further extends it to propose a frequent itemset mining algorithm with order constraints.The main contributions of this thesis are summarized as follows.(1)We propose a joint distribution estimation mechanism applicable to multidimensional classification data.The existing multidimensional data joint distribution estimation algorithms suffer from low data utilization and large estimation errors.To solve this problem,we propose a novel joint distribution estimation mechanism called Group-based Randomized Response Mechanism.This mechanism perturbs the data based on the data similarity,making noise data containing more real data related information output with a higher probability.In general,the user device is resource-constrained,and can only afford lightweight operations,and thus it optimizes the perturbation method on the user side to improve its practicality.Finally,through formal privacy analysis,we show that our proposed mechanism guarantees differential privacy,and extensive experiments demonstrate that the mechanism has high utility and efficiency.(2)We propose a privacy-preserving frequent itemset mining algorithm based on grouping mechanism,which consists of incremental candidate filtering and itemset estimation modules.Since the accuracy of the sampling based frequency estimation protocol increases with the increase of the number of users and the decrease of the candidate domain,the former utilizes this fact to solve the large candidate universe problem.In the second phase,we design a frequency estimation protocol based on two-round interaction to improve the accuracy of mining results.Through formal privacy analysis,we show that our proposed approach guarantees ε-local differential privacy.Finally,extensive experiments demonstrate that the mining results of the proposed algorithm could achieve high accuracy.(3)We propose a local differential privacy algorithm for mining frequent itemsets with order constraints.In particular,The algorithm integrated filling and sampling technology,adaptive frequency estimation algorithm and frequent itemset prediction technology to construct new candidate item domain.Utilizing this information,it employed an exponential mechanism-based strategy to perturb the user data,and discovered the final frequent sequential patterns.To further improve the accuracy of mining results,a post-processing strategy is designed.Extensive experiments demonstrate that the algorithm maintains high utility while providing privacy guarantees.
Keywords/Search Tags:local differential privacy, frequent itemset mining, privacy protection, data mining
PDF Full Text Request
Related items