Privacy Protection Of Data Mining

Posted on:2006-05-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W P Ge

Full Text:PDF

GTID:1118360155960709

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

We are in an age of information explosion. The development of processing power of computer, storage technology and internet improve the volume of digital information greatly. That inspires the demands of finding useful information from the massive data, and then drives the development of data mining. On the other hand, concerns about informational privacy have emerged globally. Therefore, Privacy and security has become the focus of many data mining researches.First of all, the paper introduces and analyzes ten typical privacy preserving algorithms from data distribution, data modification, data mining algorithm, hiding objects and privacy preserving technology dimensions.Then, a novel privacy-preserving classification mining algorithm is proposed. The main idea of this algorithm consists of two parts. The first part focuses on how to perturb the original data to preserve information privacy. Firstly, "single attribute transition probability matrix" is proposed in the first part. Secondly, "multiple split attributes joint transition probability matrix" is proposed to express multiple attributes' joint perturbing probability. The method on how to calculate its value is described, and a simple and effieient method to calculate its reverse matrix is also described. Thirdly, a data perturbing method is described to perturb original data by applying "single attribute transition probability matrix". The second part focuses on how to recover the original support count of attributes value from perturbed data to build a decision tree. Firstly, a formula is derived to recover the original support count of attributes value from perturbed data. Secondly, another formula is derived to calculate Gain by the original support count of attributes value to choose the best split attribute and split point. Thirdly, a narrative privacy-preserving decision tree classification algorithm-PPCART is given. Besides these, privacy-preserving level is quantified to measure the privacy-preserving level. An online survey example is also given to explain the application of this algorithm. A series of experiments show that this algorithm suits all data types(Boolean, Categorical, and Numeric type), arbitrary probability distribution of original data and perturbing all attributes(including label attribute), and also show that decision tree built using this algorithm on perturbed data has comparable classifying accuracy to decision tree built using un-privacy-preserving...

Keywords/Search Tags:

data mining, privacy preserving, transition probability matrix, classification, decision tree, distributed data, globally frequent itemsets, association rules

PDF Full Text Request

Related items

1	Research On Algorithm For Distributed Mining Of Association Rules
2	Research On Decision Tree Algorithm For Privacy-Preserving
3	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System
4	Research On Algorithms For Distributed Mining Of Association Rules
5	Research And Design On Privacy-Preserving Data Mining Approaches And Algorithms
6	Efficient Mining Of Association Rules In Distributed Database System
7	Research On Distributed Association Rules Min-Ing Algorithm And Its Applications
8	Research On Fast Algorithms For Frequent Itemsets Mining Based On Compressed FP-tree
9	Research On Hiding Association Rules Based On Relative-non-Sensitive Frequent Itemsets
10	Studies On Algorithms Of Association Rule Mining In Data Mining