Entity Analysis with Weak Supervision: Typing, Linking, and Attribute Extraction

Posted on:2016-03-21

Degree:Ph.D

Type:Dissertation

University:University of Washington

Candidate:Ling, Xiao

Full Text:PDF

GTID:1478390017978026

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

With the advent of the Web, textual information has grown at an explosive rate. To digest this enormous amount of data, an automatic solution, Information Extraction (IE), has become necessary. Information extraction is a task of converting unstructured text strings into structured machine-readable data. The first key step of a general IE pipeline is often to analyze entities mentioned in the text before making holistic conclusions. To fully understand each entity, one needs to detect their mentions, categorize them into semantic types, connect them with their knowledge base entries, and identify their attributes as well as the relationships with others.;In this dissertation, we first present the problem of fine-grained entity recognition. Unlike most traditional named entity recognition systems using a small set of entity classes, e.g., person, organization, location or miscellaneous, we define a novel set of over one hundred fine-grained entity types. In order to intelligently understand text and extract a wide range of information, it is useful to more precisely determine the semantic classes of entities mentioned in unstructured text. We formulate the recognition problem as multi-class, multi-label classification, describe an unsupervised method for collecting training data, and present the FIGER implementation.;Next, we demonstrate that fine-grained entity types are closely connected with other entity analysis tasks. We describe an entity linking system whose prediction heavily relies on these types and present a simple yet effective implementation, called VINCULUM. An extensive evaluation on nine data sets, comparing VINCULUM with two state-of-the-art systems, elucidates key aspects of the system that include mention extraction, candidate generation, entity type prediction, entity coreference, and coherence.;Finally, we describe an approach to acquire commonsense knowledge from a massive amount of text on the Web. In particular, a system called S IZEITALL is developed to extract numerical attribute values for various classes of entities. To resolve the ambiguity from the surface form text, we canonicalize the extractions with respect to WordNet senses and build a knowledge base on physical size for thousands of entity classes.;Throughout all three entity analysis tasks, we show the feasibility of building sophisticated IE systems without a significant investment in human effort to create sufficient labeled data.

Keywords/Search Tags:

Entity, Text, Data, Extraction, Information

PDF Full Text Request

Related items

1	The Information Leakage Detection Based On Text Information Extraction
2	Research On Financial Entity Relation Discovery For Text Data
3	Research And System Implementation Of Entity Relation Extraction Algorithm Based On Text Generation
4	Research On Text Entity Extraction Combining Related Image Information
5	Research On Named Entity Relation Extraction Based On Web Text Mining
6	Entity Relation Extraction For Free Text
7	Research On Entity Relation Extraction Of Aluminum-silicon Alloy Based On Text Mining
8	English Entity Answer Extraction And Home Find
9	Research Of Entity Knowledge Base System Based On Information Extraction
10	Research Of Entity And Relation Extraction Based On Text