Font Size: a A A

The Research On Privacy Preserving Data Mining

Posted on:2010-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:D YuFull Text:PDF
GTID:2178360275482229Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, more and more activities on data sharing and exchanging are coming forth in the network. Data mining, which is capable of acquiring interesting knowledge from rough data, has become an extensively applied analysis method. Concerns about informational privacy have emerged globally. Therefore, privacy has become the focus of many data mining researches. The focus of this paper is privacy-preserving data mining over distributed data.First of all, this paper provides an overview of data mining. Then it defines the notion of privacy in data mining algorithms and points out the objective of privacy-preserving scheme.Secondly, it introduces and analyzes some typical privacy-preserving algorithms from data distribution, data modification and privacy-preserving technology dimensions.Thirdly, by combining homomorphic encryption with order preserving encryption, this paper proposes a new classification algorithm based on the latest research. On the premise of gaining the valid results, the algorithm utilizes the attributions: addition operation and comparison operation on ciphertext. It reduces the communication and computation complexity. The experimental results show that the algorithm is an efficient solution for linear communication complexity, and has a good performance in terms of privacy, accuracy and calculation.Fourthly, a novel method of privacy-preserving clustering based on cryptography technique is developed through changing the steps of clustering on vertically partitioned data. Then a new k-medoids clustering based on this method is proposed.This paper analyzes the algorithm in terms of correctness, privacy, computation complexity and communication complexity. Experimental results show that the algorithm can achieve a better tradeoff between computation cost and communication cost, and has substantial advantage over other representative algorithms in calculation and communication overhead, accuracy and privacy. The results show this algorithm is secure for all sites by hiding the plaintext distribution.
Keywords/Search Tags:Privacy Preserving, Classification, Clustering, Homomorphic Encryption, Order Preserving Encryption Scheme
PDF Full Text Request
Related items