Font Size: a A A

Research On The Technologies Of Privacy Preserving Data Mining In Distributed Environment

Posted on:2016-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:W YuanFull Text:PDF
GTID:2308330473465517Subject:Information security
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, the massive magnitude, distributed data propose a new challenge to the traditional data mining methods. How to get accurate mining results in the distributed environment, with the data privacy well guaranteed is becoming a hot spot in the field of data mining. Among all the data mining algorithms clustering and classification are more widely used data mining methods. In this paper, the commonly used K means clustering algorithm and ID3 classification algorithm was improved, two kinds of privacy data mining algorithm is proposed.This paper firstly introduces the basic concepts of data mining and it’s implementation steps. Of all the data mining technology, this paper focus on common mining algorithm of clustering and classification mining algorithms, this paper describes the implementation of these algorithms in detail, and analyzes the advantages and disadvantages of various algorithms. Then this paper introduces the concept of privacy and protection of privacy in data mining, the paper summarized the common methods, and focus on limit distribution technology, encryption technology and secure multi-party computation technology, finally summarizes the research progress of protect the privacy of data mining. On this basis, this paper respectively does the following work in the process of clustering and classification mining data privacy protection:(1) To protect the privacy in the horizontal partitioned data environment, K-means clustering algorithm is improved in this paper using the homomorphic encryption method. The horizontal distribution of each site can secretly operate clustering mining, and the intermediate results to the safety problems in the process of communication is been solved. Clustering process is in a state of cipher text, and public key encryption makes the calculating process of the intermediate results have password protection, so the algorithm can get accurate on the premise of protecting privacy of clustering results. Theoretical analysis and experimental verification proves this point.(2) For the vertical partitioned data environment, this paper designs a new method on the basis of the ID3 decision tree classification mining algorithm. This algorithm use the homomorphic encryption scheme proposed by Pillier, the difference table and digital envelope to completing the process of decision tree classification. The process of generation decision tree is in the cipher text, so private data can be well protected. Analysis shows that this algorithm can get right under the premise to protect the privacy of the data mining results.
Keywords/Search Tags:Distributed Environment, Homomorphic encryption, Clustering mining, Classification mining
PDF Full Text Request
Related items