Font Size: a A A

Unsupervised discovery of extraction patterns for information extraction

Posted on:2005-03-07Degree:Ph.DType:Dissertation
University:New York UniversityCandidate:Sudo, KiyoshiFull Text:PDF
GTID:1458390008997325Subject:Computer Science
Abstract/Summary:
The task of Information Extraction (IE) is to find specific types of information in natural language text. In particular, event extraction identifies instances of a particular type of event or fact (a particular "scenario"), including the entities involved, and fills a database which has been pre-defined for the scenario. As the number of documents available on-line has multiplied, entity extraction has grown in importance for various applications, including tracking terrorist activities from newswire sources and building a database of job postings from the Web, to name a few.;Linguistic contexts, such as predicate-argument relationships, have been widely used as extraction patterns to identify the items to be extracted from the text. The cost of creating extraction patterns for each scenario has been a bottleneck limiting the portability of information extraction systems to different scenarios, although there has been some research on semi-supervised pattern discovery procedures to reduce this cost. The challenge is to develop a fully automatic method for identifying extraction patterns for a scenario specified by the user.;This dissertation presents a novel approach for the unsupervised discovery of extraction patterns for event extraction from raw text. First, we present a framework that allows the user to have a self-customizing information ex traction system for his/her query: the Query-Driven Information Extraction (QDIE) framework. The input to the QDIE framework is the user's query: either a set of keywords or a narrative description of the event extraction task.;Second, we assess the improvement in extraction pattern models. By considering the shortcomings of the prior work based on predicate-argument models and their extensions, we propose a novel extraction pattern model that is based on arbitrary subtrees of dependency trees.;Third, we address the issue of portability across languages. As a case study of the QDIE framework, we implemented a pre-CODIE system, a Cross-Lingual On-Demand Information Extraction system requiring minimal human intervention, which incorporates the QDIE framework as a component for pattern discovery. In addition, we assess the role of machine translation in cross-lingual information extraction by comparing translation-based implementations.
Keywords/Search Tags:Extraction, Discovery, QDIE framework, Event
Related items