Font Size: a A A

A hybrid approach to private record matching

Posted on:2011-04-12Degree:Ph.DType:Dissertation
University:The University of Texas at DallasCandidate:Inan, AliFull Text:PDF
GTID:1468390011472116Subject:Computer Science
Abstract/Summary:
Real-world entities are not always represented by the same set of features in different data sets. Therefore matching and linking records corresponding to the same real-world entity distributed across these different data sets is a challenging task. If the data sets contain private information, the problem becomes even more difficult due to privacy concerns. Existing solutions of this problem generally follow two approaches: sanitization techniques and cryptographic techniques. The former achieves privacy by perturbing sensitive data at the expense of degrading matching accuracy. The latter, on the other hand, attains both strong privacy and high accuracy under heavy communication and computational costs. We propose a hybrid technique that combines these two approaches and enables users to trade off between privacy, accuracy and cost. Our main contribution is the use of a blocking phase that operates over sanitized data to filter out in a privacy-preserving manner pairs of records that do not satisfy the matching condition. We investigate effective and efficient blocking methods under two distinct privacy paradigms: data anonymization (with several instantiations such as k-anonymity, ℓ-diversity and t-closeness) and differential privacy. We also provide a formal definition of privacy and prove that the participants of our protocols learn nothing other than their share of the result and what can be inferred from their share of the result, their input and sanitized views of the input data sets (which are considered public information). Experiments conducted on a real-world data set show that our method incurs considerably lower costs than cryptographic techniques and yields significantly more accurate matching results compared to sanitization techniques, even in cases where privacy requirements are high.
Keywords/Search Tags:Matching, Data sets, Privacy, Techniques
Related items