Font Size: a A A

Automated Comparative Table Generation For Facilitating Human Intervention In Multi-Entity Resolution

Posted on:2020-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:J C HuangFull Text:PDF
GTID:2428330575955157Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the rapid growth of knowledge graphs(KGs),billions of entities have been create from diverse sources.Entity resolution(ER)is the process of identifying entities that refer to the same real-world object.ER can discover equivalent entities from different KGs,such that facts described with different entities can be combined to construct a larger KG.So ER has long been studied in the KG community,among many others.While a significant number of automated approaches have been proposed to ad-dress the ER problem,they are still being challenged by large-scale,heterogeneously-described entities.Humans,as a valuable source of background knowledge,are in-creasingly getting involved in this loop by crowdsourcing and active learning,where presenting condensed and easily-compared information is vital to help human intervene in an ER task.However,current methods for single entity or pairwise summarization cannot well support humans to observe and compare multiple entities simultaneously,which impairs the efficiency and accuracy of human intervention.This paper proposes an automated approach to select a few important properties and values for a set of entities,and assemble them by a comparative table.The main contributions are as follows:? This paper trains a logistic regression model which combines 3 similarity measures for properties pairs to estimate the matching probabilities,and optimize the holistic property matching under duplicate-free constraint with an efficient algorithm to obtain property cliques from multiple sources.This paper also proves the holistic property matching problem under this constraint is NP-hard.? This paper designs 4 goodness measures for derived property cliques and 1 good-ness measure for values based on intuitive characteristics of ER tasks.The good-ness measures are used to judge whether cliques or values are helpful to ER tasks.This paper formulates the problem of optimal comparative table generation with the entity coverage constraint.This paper proposes an efficient algorithm to obtain approximate solutions and proves its approximation ratio.? This paper conducts extensive experiments,comparison to state-of-the-art meth-ods and user study to verify the accuracy of matched properties,effectiveness of goodness measures and user satisfaction of comparative tables for MER.
Keywords/Search Tags:Entity resolution, knowledge graph, comparative table, multi-entity summarization, holistic property matching
PDF Full Text Request
Related items