Font Size: a A A

A Utility-Aware Privacy Preserving Framework For Distributed Data Mining With Worst Case Privacy Guarantee

Posted on:2012-05-08Degree:Ph.DType:Dissertation
University:University of Maryland, Baltimore CountyCandidate:Banerjee, MadhushriFull Text:PDF
GTID:1458390008998527Subject:Information Technology
Abstract/Summary:
Data Mining is the task of finding meaningful patterns from huge amount of data. With the enormous growth of data and their distributed nature, storage of data and analysis of data are often separated, therefore developing the need for the research area of privacy preserving data mining. In other words, privacy preserving data mining is needed when the data is private in nature and revealing of sensitive information needs to the prevented, while still allowing mining of the data with reasonable accuracy. Various data perturbation techniques exist in literature for this purpose. One significant drawback with the existing methods is that they handle average case privacy scenario. But, while dealing with private and business data it would definitely be beneficial to have a privacy framework that would provide a certain guarantee that the data would not be divulged in the worst case.;In a distributed data setting, it is often beneficial for organizations to collaboratively perform data mining tasks without giving up their own data. This necessity has developed the research areas of secure multiparty computation and privacy preserving distributed data mining. There exist several protocols that deal with data mining tasks in a distributed scenario but most of these techniques handle a single data mining method. Therefore, if the participating parties are interested in more than one classification methods they will have to go through a series of distributed protocols every time for a different method thus increasing the overhead substantially.;Another critical problem with the existing privacy protection techniques is that they do not take the data mining tasks that will be performed on the perturbed data into consideration thus reducing the utility of the perturbation techniques substantially. In a distributed setting the parties are aware of the data mining tasks they would need to perform collaboratively. For example, the collaborative parties are aware that they are building a classification model or predicting an attribute. Therefore, if the data perturbation methods can be pruned according to the need of the end user the utility of the privacy protection techniques can be increased significantly.;Here, in this dissertation multiple privacy preserving data mining algorithms have been proposed for multiple data mining methods that address all these above mentioned issues and provide a utility aware approach to privacy preserving data mining in a centralized as well as distributed scenario with worst case privacy guarantee. These algorithms will also allow the end user to perform exploratory data analysis on the perturbed data. Detailed experimental results will demonstrate the effectiveness of these techniques.
Keywords/Search Tags:Data mining, Privacy preserving, Distributed, Techniques, Perturbed data, Utility, Information, Parties are aware
Related items