Font Size: a A A

Frequent Itemsets Mining For Uncertain Data Based On Differential Privacy

Posted on:2021-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2518306047981669Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,with the improvement of data collection and storage capabilities,it becomes more important to mine available knowledge from the data.Uncertainty data is a data type that can represent data content such as "probability that someone has a certain disease","probability that someone is in a certain location",and so on.Uncertain data can describe such data more reasonably than certain data.Nowadays,there are a large number of data in finance,social,medical and other fields that are suitable for the use of uncertain data storage and data mining research.However,publishing results directly may have the risk of privacy leakage in the process of data mining.Therefore,it is important to study the data mining algorithm satisfying privacy protection.This paper focuses on uncertain data frequent items mining algorithms based on differential privacy.The main research contents of this article are as follows:(1)Describe and define the research questions of frequent itemsets mining of uncertain data under differential privacy.The UFDP algorithm is proposed to mine the top-k frequent itemsets in uncertain data,and the algorithm base on differential privacy.(2)For uncertain data set,the tree structure is too large due to existence of items with the same name but different existential probabilities.The UFDP-tree tree structure is proposed,which reduces the size of the tree structure and the number of recursions in the mining of the itemsets,thereby improving the efficiency of the algorithm.(3)To solve the problem of insufficient availability of uncertain data mining results under privacy protection,the privacy budget allocation and recovery strategy is proposed.(4)In order to make the data mining results meet the privacy protection,The UFDP algorithm uses the exponential mechanism and the Laplace mechanism to make the mined top-k frequent item set satisfy the differential privacy.Finally,the performance comparison experiment of the UFDP algorithm is designed and completed,and the availability of the algorithm is evaluated by FNR(false negative rate)and RE(relative error).The number of tree nodes and running time are used to evaluate the efficiency of the algorithm.The experimental results show that the UFDP algorithm achieved good availability and operating efficiency.
Keywords/Search Tags:uncertain data, frequent itemsets mining, differential privacy, privacy budget allocation and recovery
PDF Full Text Request
Related items