The Analysis Of K-Means Clustering With Differential Privacy

Posted on:2017-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:L F Li

Full Text:PDF

GTID:2308330485988720

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Data mining that can find potential modes and rules that hidden in big data, is conducive to help us make better decisions and is widely used in fields such as business, scientific research and medical research. But the misuse of data mining is likely to pose a threat to privacy and information security. So the problem that how the privacy protection technology was applied to data mining process to ensure the security of private information, which has become one of research hotspots in the field of data mining.Early privacy protection model, such as K- anonymous and its extended model, usually require assumptions may attack model, once appear new attack model, it has to constantly improve their own models. Beside it couldn’t analyze quantitatively it’s privacy protection level. Therefore Dwork put forward the differential privacy protection model that defines a more rigorous and biggest background knowledge as attack model, uses the solid mathematical theory as the support and can use parameter é to analyze privacy protection level quantitatively.So differential privacy protect model make up for the deficiency of the traditional privacy protection and the amount of noise has nothing to do with the data set and is very suitable for privacy protection in data mining.Traditional differential privacy K-means algorithm is sensitive about the selection of its initial center and blind to the choice of clustering number value k, leading to reduce the availability of the clustering results. Consequently this paper introducing the Canopy algorithm to the differential privacy K-means algorithm, presents a fusion algorithm of Canopy and differential privacy K-means, namely the DP Canopy K-means algorithm. DP Canopy K-means algorithm can effectively avoid blindness of value k and sensitivity of the initial point, reduce the iteration times, and improve the availability of the clustering results. DP Canopy K-means algorithm can be applied to data mining scenarios that require to ensure privacy data security and clustering result available.PINQ(Privacy Integrated Queries) was the first prototype system that provided differential privacy protection for sensitive data query. This paper designed and implemented the DP Canopy K-means algorithm and IDP K-means algorithm based on PINQ platform, meanwhile ran the two algorithms on Magic and Blood data sets and compared their availability of clustering results. The experimental results showed that under the same privacy protection level. DP Canopy K-means algorithm is higher than the IDP K-means algorithm on the accuracy of the clustering results, and higher than the traditional DP K- means algorithm on the rate of convergence.In this paper, we selected the group recommendation system as the application scenario in order to validate the algorithm effectiveness in practical application. In the group recommendation system, the problem of privacy leaks is analyzed, and the DP Canopy K-means algorithm is introduced to ensure the security of the user’s privacy. The experimental results show that in group recommendation, DP Canopy K-means algorithm does not have a significant impact on the accuracy of the recommendation (under a certain privacy budget, the error is not more than 3%), which can balance the problem of privacy protection and the accuracy of recommendation in group recommendation system.

Keywords/Search Tags:

data mining, privacy protect, differential privacy, clustering, group recommendation, group detection

PDF Full Text Request

Related items

1	Research On Clustering Algorithms In Differential Privacy
2	Research On Privacy Preserving Data Mining Of Mobile Internet User Behavior
3	Research And Application On Privacy Protection For Data Mining
4	Research On Key Technologies Of Privacy Preserving Data Mining Based On Local Differential Privacy
5	Research On Data Publishing And Mining Method Based On Differential Privacy
6	Research On Differentially Private Classification And Recommendation Algorithms
7	Research On K-means++ Clustering Algorithm Based On Laplace Mechanism For Differential Privacy Protection
8	Research On K-means Clustering Algorithm Based On Differential Privacy
9	Distributed Clustering Algorithm Based On Privacy Protection
10	Research On Improvement Of K-means Clustering Algorithm Based On Differential Privacy