Font Size: a A A

Human-Centric Debugging of Entity Matching

Posted on:2018-01-21Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Panahi, FatemahFull Text:PDF
GTID:2478390017991149Subject:Computer Science
Abstract/Summary:
Entity matching (EM) is the problem of finding data records that refer to the same real-world entity. For example, the two records (Matthew Richardson, 206-453-1978) and (Matt W. Richardson, 453 1978) may refer to the same person. It is an important data integration problem with many applications such as in e-commerce, healthcare, and national security. Recent work on entity matching has focused on using machine learning and/or crowdsourcing in order to improve accuracy and/or scale the current matching solutions despite the fact that this task is typically done with a human analyst in the loop. Therefore, in this thesis we propose to work on solutions that acknowledge that humans are in the loop for completing an entity matching task. We focus on debugging of entity matching, which is an iterative process by which an analyst improves matching quality. Hence the title, "Human-Centric Debugging of Entity Matching''.;We build an end-to-end matching system and experiment with it in an e-commerce setting as well as with students in a graduate data modeling course at UW-Madison. We also develop an abstract model of the entity matching problem for an analyst to understand what makes an entity matching problem hard for an analyst. The insights learned in the above work lead to the following works in the rest of the thesis: First, we focus on debugging rule-based matchers and we attempt to make it an interactive process by which an analyst can quickly iterate and find a high quality matcher. We show that by optimally ordering the rules as well as incrementally running the matcher on top of previous matching output we can decrease runtime significantly. And second, we focus on debugging of entity matching data sets. We develop a framework to help an analyst quickly find and resolve inconsistencies in a data set. We experiment with seven real-world data sets and demonstrate the effectiveness of our framework in finding inconsistencies.
Keywords/Search Tags:Entity matching, Data, Debugging, Problem
Related items