Font Size: a A A

Study On Privacy Preserving Classification Data Mining

Posted on:2011-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:H ShaoFull Text:PDF
GTID:2178330332970069Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a powerful tool to search useful knowledge from mass data, which can help people obtain deeper information. Data mining has been widely used in many fields, such as banks, telecommunications, insurance, biological data analysis and etc. With the wide application of the network, data mining always needs the data from different sites.In this case, these different orgnizations want to get the data mining results but without disclosing their private data when they collaboratively conduct the overall data mining. So the study of distributed privacy preserving data mining has emerged as a very urgent and challenging research area. This paper mainly studied the privacy preserving distributed classification data mining, we focuse on the decision tree classification algorithm, and C4.5 has been applied as decision tree buliding algorithm.(1) We modified the centralized C4.5 decision tree classification algorithm, and based on the Secure Multi-party Computation method we designed an algorithm of privacy preserving distributed C4.5 decision tree algorithm, which is applicable to vertically and horizontally partitioned dataset.When dataset is horizontally partitioned, the secure sum protocol and secure xln(x) protocol are used to build decision tree. When dataset is vertically partitioned, the secure scalar product protocol and secure xln(x) protocol are used to build decision tree. We also gave the detailed computation method of the information gain ratio in the case of without revealing privacy.(2) We proposed a new computation method of the degree of privacy protection, which is applicable to the decision tree classification algorithm.Based on open dataset we compared the privacy preserving C4.5 algorithm with the original C4.5 algorithm on WEKA. Experimental results demonstrate that the privacy preserving algorithm can well protect the original data from revealing, and keep high classification correct accuracy.(3) We innovatively combined the secure multiparty computation (SMC) with k-anonymity technology, and achieved personalized privacy preserving distributed data mining. Different customers have different privacy preserving needs, so we classify the attributes to different kinds. For those attributes with high privacy preserving needs we ues Security Multi-party Computation method that can not reveal any privacy.For those attributes with medium or low privacy preserving needs we firstly anonymize the datasets without leaking the privacy by the distributed K-anonymity algorithm, and then conduct data mining on the datasets. We propose a new personalized privacy preserving distributed data mining framework.
Keywords/Search Tags:Distributed Data Mining, Privacy Preserving, C4.5 Decision Tree Classification Algorithm, Secure Multi-party Computation, K–anonymity, Personalized
PDF Full Text Request
Related items