Font Size: a A A

Privacy-enhancing data mining: Issues, techniques and measures

Posted on:2005-05-21Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Li, JingquanFull Text:PDF
GTID:1458390008491960Subject:Business Administration
Abstract/Summary:
Data mining is increasingly recognized as a key to analyzing massive volumes of business data and discovering new knowledge and business rules for improving business intelligence. Since enterprise data may include very sensitive information about individuals, companies gradually pay attention to the data privacy issues while implementing business intelligence solutions. The challenge is on protecting the privacy but at the same time preparing relevant data for effective mining. The potential of data mining to invade the data privacy is a complex phenomenon. Such phenomenon is called as the "dossier" effect. This study explores the "dossier" effect and its privacy implications in data mining. I develop a general framework for privacy-enhancing data mining. I develop a general framework for attribute analysis and privacy-enhancing data mining. The study concentrates on a formal model of privacy-enhancing data transformation and systematically analyzes privacy-enhancing data transformation techniques. The techniques can be broadly classified into three categories: data reduction which restricts the information to be released, data generalization which hides the sensitive data by providing more generalized data, and data perturbation which perturbs the original database before it is released. Based on information theory, I propose theoretically sound privacy and information loss metrics to quantifying the performance of the methods.; The study presents some effective privacy-enhancing transformation techniques that are applicable to various data types. The techniques are able to retain privacy while accessing the information contained in the original data. Specifically, we address the issue of privacy protection through using the data filter, partitioning, synthetic data, and randomization methods. We give examples of inducing the decision-tree classifiers and building detection models of fraud from training data in which the values of sensitive attribute values have been modified. We experimentally validate the privacy-enhancing techniques and the measurement methodology over both real world and synthetic datasets. The experimental results show that the application of privacy-enhancing techniques can preserve the data privacy with minimum loss of information. The results also demonstrate that the proposed techniques can achieve comparative performance measures or mining results while preserving the data privacy.
Keywords/Search Tags:Mining, Privacy, Techniques, Business
Related items