Rapid Training of Information Extraction with Local and Global Data Views

Posted on:2013-07-12

Degree:Ph.D

Type:Dissertation

University:New York University

Candidate:Sun, Ang

Full Text:PDF

GTID:1458390008486412

Subject:Computer Science

Abstract/Summary:

This dissertation focuses on fast system development for Information Extraction (IE). State-of-the-art systems heavily rely on extensively annotated corpora, which are slow to build for a new domain or task. Moreover, previous systems are mostly built with local evidence such as words in a short context window or features that are extracted at the sentence level. They usually generalize poorly on new domains.;This dissertation presents novel approaches for rapidly training an IE system for a new domain or task based on both local and global evidence. Specifically, we present three systems: a relation type extension system based on active learning, a relation type extension system based on semi-supervised learning, and a cross-domain bootstrapping system for domain adaptive named entity extraction.;The active learning procedure adopts features extracted at the sentence level as the local view and distributional similarities between relational phrases as the global view. It builds two classifiers based on these two views to find the most informative contention data points to request human labels so as to reduce annotation cost.;The semi-supervised system aims to learn a large set of accurate patterns for extracting relations between names from only a few seed patterns. It estimates the confidence of a name pair both locally and globally: locally by looking at the patterns that connect the pair in isolation; globally by incorporating the evidence from the clusters of patterns that connect the pair. The use of pattern clusters can prevent semantic drift and contribute to a natural stopping criterion for semi-supervised relation pattern discovery.;For adapting a named entity recognition system to a new domain, we propose a cross-domain bootstrapping algorithm, which iteratively learns a model for the new domain with labeled data from the original domain and unlabeled data from the new domain. We first use word clusters as global evidence to generalize features that are extracted from a local context window. We then select self-learned instances as additional training examples using multiple criteria, including some based on global evidence.

Keywords/Search Tags:

Global, Extraction, Training, Local, System, Data, New domain

Related items

1	The Global Plus Local Feature Extraction Based On Collaborative Representation And Its Application
2	Study Of Fault Detection Based On Global And Local Structure Feature Extraction
3	Research On Imperfect Bivine Iris Recognition Based On Combination Of Local Features And Global Features
4	Fusion Of Global And Local Feature For Face Recognition
5	The Research On Collaboration-training Algorithm And Its Application
6	Convexification and Deconvexification for Training Artificial Neural Networks
7	Research On Method And Application Of Data Mining Combining Global And Local Analysis
8	Design And Implementation Of Cadre Training Management System Of Local Tax Bureau Based On B/S
9	Construction Of COVID-19 Domain Knowledge Graph Based On Pre-training Language Model
10	Deep Forgery Detection Based On Global And Local Feature Enhancement