Font Size: a A A

Research And Design On Privacy-Preserving Data Mining Approaches And Algorithms

Posted on:2007-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:H WenFull Text:PDF
GTID:2178360182966622Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advance of the information age, data collection and analysis have exploded both in size and complexity. The attempt to extract important patterns and trends from the vast data sets has led to the challenging field data mining. In many contexts, data are distributed across different sites and held by different organizations who' re reluctant to share their data with others due to privacy and confidentiality concerns. Privacy-Preserving Data Mining (PPDM) has emerged to address this issue. The research of PPDM is aimed at bridging the gap between collaborative data mining and data confidentiality.There are many approaches which have been adopted for PPDM. They can be classified into two major categories: Randomization-based approaches and Cryptography-based techniques. We focused on the former, which has two schemes, Random Perturbation (RP) and Randomized Response (RR).RP and data reconstruction, as an important technique in PPDM field, can' t limit privacy breaches of datasets with high correlated attributes effectively. We brought forward an improvement by Principal Component Analysis (PCA) to reduce the attributes involved in data mining and preserve more privacy of original data. We also tried to quantify the relationship between performance of the algorithm and compression of original attributes/random noise.RR techniques were developed in the statistics community for the purpose of protecting surveyee' s privacy. We described how to use RR techniques to build decision tree classifier from the disguised data, with ID3 algorithm. We also conducted privacy-preserving association rules mining using RR techniques and discussed the process of the algorithm in detail.
Keywords/Search Tags:Privacy-Preserving Data Mining, Random Perturbation, Principal Component Analysis, Randomized Response, Decision Tree Classification, Association Rules Mining
PDF Full Text Request
Related items