Graph-based approaches to resolve entity ambiguity

Posted on:2017-11-16

Degree:Ph.D

Type:Dissertation

University:New York University

Candidate:Pershina, Maria

Full Text:PDF

GTID:1458390008975435

Subject:Computer Science

Abstract/Summary:

Information extraction is the task of automatically extracting structured information from unstructured or semi-structured machine-readable documents. One of the challenges of Information Extraction is to resolve ambiguity between entities either in a knowledge base or in text documents. There are many variations of this problem and it is known under different names, such as coreference resolution, entity disambiguation, entity linking, entity matching, etc. For example, the task of coreference resolution decides whether two expressions refer to the same entity; entity disambiguation determines how to map an entity mention to an appropriate entity in a knowledge base (KB); the main focus of entity linking is to infer that two entity mentions in a document(s) refer to the same real world entity even if they do not appear in a KB; entity matching (also record deduplication, entity resolution, reference reconciliation) is to merge records from databases if they refer to the same object.;Resolving ambiguity and finding proper matches between entities is an important step for many downstream applications, such as data integration, question answering, relation extraction, etc. The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains, posing a scalability challenge for Information Extraction systems. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and to answer complex queries. However the efficient alignment of large-scale knowledge bases still poses a considerable challenge.;Various aspects and different settings to resolve ambiguity between entities are studied in this dissertation. A new scalable domain-independent graph-based approach utilizing Personalized Page Rank is developed for entity matching across large-scale knowledge bases and evaluated on datasets of 110 million and 203 million entities. A new model for entity disambiguation between a document and a knowledge base utilizing a document graph and effectively filtering out noise is proposed; corresponding datasets are released. A competitive result of 91.7% in microaccuracy on a benchmark AIDA dataset is achieved, outperforming the most recent state-of-the-art models. A new technique based on a paraphrase detection model is proposed to recognize name variations for an entity in a document. Corresponding training and test datasets are made publicly available. A new approach integrating a graph-based entity disambiguation model and this technique is presented for an entity linking task and is evaluated on a dataset for the Text Analysis Conference Entity Discovery and Linking task.

Keywords/Search Tags:

Entity, Task, Large-scale knowledge bases, Graph-based, Ambiguity, Resolve, Document, Extraction

Related items

1	Multi-task Academic Knowledge Graph Construction Method Based On Event Extraction
2	Document-level Entity Relation Extraction Based On Document Structure And External Knowledge
3	Research And Implementation Of Knowledge Extraction For Domain Knowledge Graph Construction
4	Generation And Application Of Entity Descriptions Based On Large-scale Knowledge Base
5	Research On Large-scale Knowledge Graph Embedding Methods
6	Research On Encyclopedic Knowledge Bases Oriented Entity-document Relevance Classification
7	Key Techniques Of Entity Alignment In Knowledge Bases
8	Document-level Framework For Chinese Financial Event Extraction Based On External Knowledge
9	Towards Population of Knowledge Bases From Conversational Source
10	Construction And Application Of Knowledge Graph In Financial Field