Font Size: a A A

Perturbation based privacy preserving data mining techniques for real-world data

Posted on:2009-02-25Degree:Ph.DType:Dissertation
University:The University of Texas at DallasCandidate:Liu, LiFull Text:PDF
GTID:1448390002999444Subject:Computer Science
Abstract/Summary:
The Perturbation method has been extensively studied for privacy preserving data mining. In this method, random noise from a known distribution is added to the privacy sensitive data before the data is sent to the data miner. Subsequently, the data miner reconstructs an approximation to the original data distribution from the perturbed data and uses the reconstructed distribution for data mining purposes. Due to the addition of noise, loss of information versus preservation of privacy is always a trade off in the perturbation-based approaches. The question is, to what extent are the users willing to compromise their privacy? This is a choice that changes from individual to individual. Different individuals may have different attitudes towards privacy based on customs and cultures. Unfortunately, current perturbation based privacy preserving data mining techniques do not allow the individuals to choose their desired privacy levels. This is a drawback as privacy is a personal choice. In this dissertation, we propose an individually adaptable perturbation model, which enables the individuals to choose their own privacy levels. The effectiveness of our new approach is demonstrated by various experiments conducted on both synthetic and real-world data sets.;Reconstruction of original distribution has be questioned for potential privacy breaches. After investigating the reconstruction step in detail, we also question the applicability of this approach deal with the real-word data. In this dissertation, we propose a new perturbation based technique. In our solution, instead of rebuilding the original data distribution, we modify the data mining algorithms so that they can be directly used on the perturbed data. In other words, we directly build a classifier for the original data set from the perturbed training data set. Our approach is especially suitable for the scenarios where the reconstruction of the original data distribution may not be successful, due to the limited amount of training data.
Keywords/Search Tags:Privacy preserving data mining, Original data distribution, Real-world data, Training data
Related items