Font Size: a A A

Building a reference resolution system using human language processing for inspiration

Posted on:2011-11-22Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Watters, Shana KayFull Text:PDF
GTID:2448390002968296Subject:Language
Abstract/Summary:
For over 30 years, reference resolution, the process of determining what a noun phrase including a pronoun refers to in written and spoken language, has been an important and on-going area of research. Most existing pronominal reference resolution algorithms and systems are designed to use syntactic information and surface features (e.g. number and gender). These lines of research with regard to pronominal reference resolution have plateaued with accuracy rates in the vicinity of 80%(+/-10), depending on the domain and techniques used.;This thesis explores how to incorporate multiple theories and algorithms into a single system (i.e. a pipeline of components each specializing in a certain aspect of reference resolution). Our framework combines subsystems that each specialize in an aspect of reference resolution for the pronoun it.;The framework contains a total of five subsystems: (1) Creates a set of prospective antecedents that is previous forms such as noun phrases, clauses, and verb phrases that introduce possible referents. Rules established by our empirical study investigating the Givenness Hierarchy's claim that the cognitive status of being in focus is necessary for being a referent of it are used to guide antecedent selection. (2) Uses binding theory to disqualify possible antecedents using syntactic information. (3) Uses number and gender to disqualify possible antecedents. (4) Creates a framework for semantic reasoning by integrating information from VerbNet, Propbank, and WordNet. The framework allows for reasoning about what type of semantic restrictions and constraints for a given verb can be enforced on the prospective antecedent of it. (5) When two or more forms remain in the set of prospective antecedents, a preference-based algorithm is employed to select the best guess from the set of possible antecedents.;The framework created by this thesis includes a database and a computer system that implements a portion of the pipelined architecture. The database describes in tabular form all the information used to create the semantic reasoning subsystem, the parts of the Penn Treebank Wall Street Journal corpus used for testing, the information used by the number and gender subsystems, the results of each stage of the pipelined system, and the information used to create the preference-based algorithm for the best guess.;The system integrates research from the fields of linguistics, cognitive science, and computer science to create the next generation of reference resolution systems capable of understanding what we mean when we write or talk.
Keywords/Search Tags:Reference resolution, System
Related items