Font Size: a A A

The Analysis Of K-Means Clustering With Differential Privacy

Posted on:2017-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:L F LiFull Text:PDF
GTID:2308330485988720Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining that can find potential modes and rules that hidden in big data, is conducive to help us make better decisions and is widely used in fields such as business, scientific research and medical research. But the misuse of data mining is likely to pose a threat to privacy and information security. So the problem that how the privacy protection technology was applied to data mining process to ensure the security of private information, which has become one of research hotspots in the field of data mining.Early privacy protection model, such as K- anonymous and its extended model, usually require assumptions may attack model, once appear new attack model, it has to constantly improve their own models. Beside it couldn’t analyze quantitatively it’s privacy protection level. Therefore Dwork put forward the differential privacy protection model that defines a more rigorous and biggest background knowledge as attack model, uses the solid mathematical theory as the support and can use parameter é to analyze privacy protection level quantitatively.So differential privacy protect model make up for the deficiency of the traditional privacy protection and the amount of noise has nothing to do with the data set and is very suitable for privacy protection in data mining.Traditional differential privacy K-means algorithm is sensitive about the selection of its initial center and blind to the choice of clustering number value k, leading to reduce the availability of the clustering results. Consequently this paper introducing the Canopy algorithm to the differential privacy K-means algorithm, presents a fusion algorithm of Canopy and differential privacy K-means, namely the DP Canopy K-means algorithm. DP Canopy K-means algorithm can effectively avoid blindness of value k and sensitivity of the initial point, reduce the iteration times, and improve the availability of the clustering results. DP Canopy K-means algorithm can be applied to data mining scenarios that require to ensure privacy data security and clustering result available.PINQ(Privacy Integrated Queries) was the first prototype system that provided differential privacy protection for sensitive data query. This paper designed and implemented the DP Canopy K-means algorithm and IDP K-means algorithm based on PINQ platform, meanwhile ran the two algorithms on Magic and Blood data sets and compared their availability of clustering results. The experimental results showed that under the same privacy protection level. DP Canopy K-means algorithm is higher than the IDP K-means algorithm on the accuracy of the clustering results, and higher than the traditional DP K- means algorithm on the rate of convergence.In this paper, we selected the group recommendation system as the application scenario in order to validate the algorithm effectiveness in practical application. In the group recommendation system, the problem of privacy leaks is analyzed, and the DP Canopy K-means algorithm is introduced to ensure the security of the user’s privacy. The experimental results show that in group recommendation, DP Canopy K-means algorithm does not have a significant impact on the accuracy of the recommendation (under a certain privacy budget, the error is not more than 3%), which can balance the problem of privacy protection and the accuracy of recommendation in group recommendation system.
Keywords/Search Tags:data mining, privacy protect, differential privacy, clustering, group recommendation, group detection
PDF Full Text Request
Related items