Font Size: a A A

Accurate Analysis of Large Private Datasets

Posted on:2011-07-17Degree:Ph.DType:Thesis
University:University of WashingtonCandidate:Rastogi, VibhorFull Text:PDF
GTID:2449390002467467Subject:Computer Science
Abstract/Summary:
Today, no individual has full control over access to his personal information. Private data collected by hospitals and universities, by real-world sensor deployments, and also by web- sites like Google and Facebook, contain valuable statistical facts that can be mined for research and analysis, e.g., analyze outbreak of diseases, detect traffic patterns on the road, or understand browsing trends on the web. However concerns about individual privacy severely restrict the use of such data, e.g., privacy attacks led AOL to recently pull-off its published search-log data.;To remedy this, much recent work focuses on data analysis with formal privacy guarantees. This has given rise to differential privacy considered by many as the golden standard of privacy. However, few practical techniques satisfying differential privacy exist for complex analysis tasks (e.g., analysis involving complex queries), or new data models (e.g., data having temporal correlations). In this thesis, we discuss techniques that fill this void.;Central to this thesis is an equivalence result between a variant of differential privacy and adversarial privacy --- an alternative privacy definition commonly used in the literature. Apart from its obvious theoretical interest of connecting two different privacy paradigms, the result also provides a framework for building new practical privacy applications by combining the theoretical rigor of differential privacy with the flexibility of adversarial privacy.;Based on this result we propose a query answering algorithm that can handle joins (previously, no private technique could accurately answer join queries arising in many analysis tasks). This algorithm makes several privacy-preserving analyses over social network graphs possible for the first time. Another application of the equivalence result is a query-answering technique over time-series data, which enables private analysis of GPS traces and other temporally-correlated data.;This thesis thus makes three key contributions: an equivalence result between two fundamental, but seemingly unrelated paradigms of privacy, the application of the equivalence result to answering join queries, and also the application to querying time-series data. Together, these contributions enable for the first time powerful data analysis with formal privacy guarantees.
Keywords/Search Tags:Data, Privacy, Private, Equivalence result
Related items