Exploring privacy and personalization in information retrieval applications

Posted on:2014-05-18

Degree:Ph.D

Type:Dissertation

University:University of Massachusetts Amherst

Candidate:Feild, Henry A

Full Text:PDF

GTID:1458390008955706

Subject:Computer Science

Abstract/Summary:

A growing number of information retrieval applications rely on search behavior aggregated over many users. If aggregated data such as search query reformulations is not handled properly, it can allow users to be identified and their privacy compromised. Besides leveraging aggregate data, it is also common for applications to make use of user-specific behavior in order to provide a personalized experience for users. Unlike aggregate data, privacy is not an issue in individual personalization since users are the only consumers of their own data. The goal of this work is to explore the effects of personalization and privacy preservation methods on three information retrieval applications, namely search task identification, task-aware query recommendation, and searcher frustration detection. We pursue this goal by first introducing a novel framework called CrowdLogging for logging and aggregating data privately over a distributed set of users. We then describe several privacy mechanisms for sanitizing global data, including one novel mechanism based on differential privacy. We present a template for describing how local user data and global aggregate data are collected, processed, and used within an application, and apply this template to our three applications. We find that sanitizing feature vectors aggregated across users has a low impact on performance for classification applications (search task identification and searcher frustration detection). However, sanitizing free-text query reformulations is extremely detrimental to performance for the query recommendation application we consider. Personalization is useful to some degree in all the applications we explore when integrated with global information, achieving gains for search task identification, task-aware query recommendation, and searcher frustration detection. Finally we introduce an open source system called CrowdLogger that implements the CrowdLogging framework and also serves as a platform for conducting in-situ user studies of search behavior, prototyping and evaluating information retrieval applications, and collecting labeled data.

Keywords/Search Tags:

Information retrieval applications, Data, Search, Privacy, Behavior, Personalization, Users

Related items

1	Rural Users Network Information Retrieval Behavior Research
2	Research And Implementation Of Intelligent Information Retrieval Technology In American Health Care System
3	The Research Of Personal Information Search Technology
4	Privacy Preserved Data Retrieval Over Encrypted Data
5	Study On The Establishment Of Evaluation System For The Information Search Behavior Of University Library Users Under Network Environment
6	Research Of An Information Retrieval Algorithm Based On The Relevance Of Mobile Search Users
7	Research On Anonymity Techniques For Personalization Privacy-preserving Data Publishing
8	Personalized Web Information Retrieval System, Design And Implementation
9	Research Of Web-Based Personalized Information Search System
10	The Status And Factors Associated With Privacy Disclosure Behavior For WeChat Users