Font Size: a A A

Goal-Based Entity Resolution

Posted on:2012-09-07Degree:M.SType:Thesis
University:University of California, IrvineCandidate:Udupa, KartikFull Text:PDF
GTID:2458390011956867Subject:Computer Science
Abstract/Summary:
The problem of improving data quality is an ongoing challenge, and this challenge has only become greater with the increase in the number of data sources today. We concentrate on one specific part of the data quality problem: entity resolution. This thesis explores the merit in the idea of performing entity resolution in context of a query. Since cleaning operations are expensive in nature, we try to reduce the cost incurred in data cleaning. Given a query, we detect which cleaning operations are avoidable, while still being able to answer it correctly. We achieve this by building an algorithmic framework which enables us to reduce the number of cleaning operations on the data. The portion of the data which needs to be cleaned and the extent of the cleaning will depend on both the nature of the data and the query itself. We explore this alternative approach to entity resolution and measure the gains achieved in terms of reduction in cleaning cost.
Keywords/Search Tags:Entity resolution, Data, Cleaning
Related items