Research On Frequent Itemsets Mining Algorithm Under Differential Privacy

Posted on:2021-04-10

Degree:Master

Type:Thesis

Country:China

Candidate:C Jiang

Full Text:PDF

GTID:2428330614965946

Subject:Information security

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of data mining,machine learning and deep learning,various industries have accumulated large amounts of data from users.In order to provide users with more personalized service,companies analyzed and processed these data to understand the habits and preferences of users.However,the data generated by users in daily life contains a large amount of personally sensitive information.Direct release or analysis will allow the criminals to collect the user's privacy,thereby performing cyber fraud,telephone fraud,Trojan horse attacks,etc.Differential privacy is a privacy protection mechanism with rigorous mathematical theory.By assuming that the attacker has the greatest background knowledge,adding carefully designed noise during the execution of each step of the algorithm allows the final output to protect the privacy of users,and the privacy protection level can be adjusted by adjusting the size of the privacy budget.At present,differential privacy has been applied to many fields of data mining,such as principal component analysis,clustering,frequent pattern mining,etc.Differential privacy protects privacy by adding noise.The size of the noise is closely related to the dimension of the dataset and directly affects the availability of frequent itemset mining results.Therefore,how to balance security and utility is a challenge faced by frequent itemsets mining algorithms.This thesis proposes two frequent itemsets mining algorithms that satisfy differential privacy protection from two starting points: improving the availability of mining results and improving the efficiency of algorithms.The thesis investigates and analyzes some existing frequent itemsets mining algorithms,and finds that reducing the longest transaction length in the dataset is the key method to improve the availability of mining results of frequent itemsets mining algorithms.How to process the datasets to balance the error introduced by noise and the reduced availability of datasets are the challenges faced by frequent itemsets mining algorithms.This thesis proposes a new differentially private FIM algorithmTrun Super.This algorithm truncates the transaction datasets to reduce the dimension,and sorts the items in decreasing order,then eliminates the items with less support.In this way,it can reduce the information loss of the frequent itemsets.Because frequent itemset mining algorithms that satisfy differential privacy protection need to traverse the dataset multiple times,reducing the time required to traverse a dataset is the key method to improve the efficiency of frequent itemset mining algorithms.The use of sampling datasets will definitely affect the accuracy of mining results,how to use the sampling dataset to improve the efficiency of the algorithm,while ensuring the availability of mining results as much as possible is the challenge faced by frequent itemsets mining algorithms.Aiming at this direction,this thesis proposes a frequent itemset mining algorithm Sample Trun that uses the central limit theorem to calculate the sampling number.The innovation of this algorithm mainly reflects two points.We use the central limit theorem to calculate a reasonable sampling number.And we propose several steps that are most suitable for sampling datasets by the analysis of each step.Finally,the experiments on several real datasets verify the superiority of the two algorithms proposed in the thesis.

Keywords/Search Tags:

Differential privacy, frequent itmesets mining, Laplace mechanism, Exponential mechanism, transaction truncating, sampling

PDF Full Text Request

Related items

1	Research On Frequent Itemset Mining Method With Differential Privacy Based On Transaction Truncation
2	Reseach On Algorithms For Mining Association Rules Satisfied With Differential Privacy
3	Research Of Frequent Itemsets Mining Algorithm With Differential Privacy For Large-scale Data
4	Research On Frequent Pattern Mining Algorithm Under Local Differential Privacy
5	A New Method Of Linear Query For Differential Privacy Protection
6	Research On Correlation Optimization Of Differential Privacy Regression Analysis Based On Laplace Mechanism
7	Research On Incentive Mechanism Of Crowd Sensing For Localized Privacy Preservation
8	Study On The Frequent Itemset Mining Based On Differential Privacy
9	Research On Key Technology And Its Applications Of Boundary Limited Differential Privacy
10	Research On K-means++ Clustering Algorithm Based On Laplace Mechanism For Differential Privacy Protection