Research On Distributed Data Mining Methods Based On Differential Privacy

Posted on:2023-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhong

Full Text:PDF

GTID:2568307061953959

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of data collection and sharing technology,collecting business data distributed on different terminals for analysis and modeling has become an important form of big data mining.However,these terminals may belong to different institutions and do not trust each other.With the increasing attention to data privacy,how to mine the knowledge contained in the global data set under the premise of protecting the data privacy of each terminal has become an urgent problem to be solved.To solve the above problem,this thesis focuses on frequent itemset mining and decision tree classification mining in the distributed scenarios,studies distributed privacy-preserving schemes based on differential privacy,and achieves effective extractions of global association patterns and classification patterns while taking into account data privacy of all parties.The main work of the thesis is as follows:(1)Aiming at the privacy-preserving top-k frequent itemset mining problem in the distributed scenario,a privacy-preserving mining method DP-DFIM based on differential privacy is designed,which mines the frequent itemsets by setting the central node to aggregate the noisy support count of the itemsets of all parties.In order to maintain the utility of the support count,a post-processing scheme is designed based on the order constraint of the support count,which improves the accuracy of the support count.To further reduce the influence of noise,the noisy support count is modified based on the similarity between the global support distribution and the central node’s support distribution to improve the quality of mining.(2)Aiming at the privacy-preserving decision tree classification problem in the distributed scenario,a decision tree construction method DP-DDTC that satisfies differential privacy is proposed,in which all parties send the noisy results of the count query to the server for summarization to determine the optimal splitting attribute.In order to ensure the utility of the results of the count query,an optimization scheme is designed based on the constraints satisfied by the query values,which improves the accuracy of query values.For the problem of excessive noise covering the true value,a targeted privacy budget allocation scheme is designed to control the signal-to-noise ratio.In order to further reduce the influence of noise,a metric is designed to measure the importance of attributes,so as to filter useless attributes,reduce the amount of injected noise and improve the mining accuracy.The experimental results based on real data sets show that the methods proposed in this thesis can ensure the utility of mining results while satisfying differential privacy.

Keywords/Search Tags:

Differential Privacy, Distribution, Frequent Itemset Mining, Decision Tree

PDF Full Text Request

Related items

1	Study On The Frequent Itemset Mining Based On Differential Privacy
2	Research On Frequency Estimation And Frequent Itemset Mining For Local Differential Privacy Protection
3	Research On Frequent Itemset Mining Method With Differential Privacy Based On Transaction Truncation
4	Research On Frequent Itemset Mining Based On Differentially Private Model
5	Research On Frequent Itemset Mining Based On Local Differential Privacy
6	Frequent Itemset Mining Under Local Differential Privacy Based On Matrix Decomposition
7	Research On Frequent Itemset Mining Of Complex Data Based On Local Differential Privacy
8	Research On Privacy Preserving Approaches For Frequent Itemset Mining And High-Utility Itemset Mining
9	Research On The Key Technology Of Frequent Pattern Mining Based On Local Differential Privacy
10	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph