Font Size: a A A

Privacy Protection Of Data Mining

Posted on:2006-05-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:W P GeFull Text:PDF
GTID:1118360155960709Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
We are in an age of information explosion. The development of processing power of computer, storage technology and internet improve the volume of digital information greatly. That inspires the demands of finding useful information from the massive data, and then drives the development of data mining. On the other hand, concerns about informational privacy have emerged globally. Therefore, Privacy and security has become the focus of many data mining researches.First of all, the paper introduces and analyzes ten typical privacy preserving algorithms from data distribution, data modification, data mining algorithm, hiding objects and privacy preserving technology dimensions.Then, a novel privacy-preserving classification mining algorithm is proposed. The main idea of this algorithm consists of two parts. The first part focuses on how to perturb the original data to preserve information privacy. Firstly, "single attribute transition probability matrix" is proposed in the first part. Secondly, "multiple split attributes joint transition probability matrix" is proposed to express multiple attributes' joint perturbing probability. The method on how to calculate its value is described, and a simple and effieient method to calculate its reverse matrix is also described. Thirdly, a data perturbing method is described to perturb original data by applying "single attribute transition probability matrix". The second part focuses on how to recover the original support count of attributes value from perturbed data to build a decision tree. Firstly, a formula is derived to recover the original support count of attributes value from perturbed data. Secondly, another formula is derived to calculate Gain by the original support count of attributes value to choose the best split attribute and split point. Thirdly, a narrative privacy-preserving decision tree classification algorithm-PPCART is given. Besides these, privacy-preserving level is quantified to measure the privacy-preserving level. An online survey example is also given to explain the application of this algorithm. A series of experiments show that this algorithm suits all data types(Boolean, Categorical, and Numeric type), arbitrary probability distribution of original data and perturbing all attributes(including label attribute), and also show that decision tree built using this algorithm on perturbed data has comparable classifying accuracy to decision tree built using un-privacy-preserving...
Keywords/Search Tags:data mining, privacy preserving, transition probability matrix, classification, decision tree, distributed data, globally frequent itemsets, association rules
PDF Full Text Request
Related items