Font Size: a A A

Engineering lexical semantics for natural language processing systems

Posted on:1998-07-25Degree:Ph.DType:Dissertation
University:Lehigh UniversityCandidate:Kogut, Paul AnthonyFull Text:PDF
GTID:1468390014478728Subject:Computer Science
Abstract/Summary:
Educated adults apply a vocabulary of at least 100,000 words when they read a domain independent text such as a newspaper. Building a lexicon large enough to handle domain independent text is one of the major engineering problems in Natural Language Processing (NLP). Generating the semantic information for a lexicon, including selectional restrictions on the subjects and objects of verbs, is especially difficult because the information is not readily available from a single source such as a machine readable dictionary or sample text in a corpus. Selectional restrictions are important for domain independent text because they can help disambiguate frequently occurring words which tend to have many word senses. Generating a lexicon with semantics involves a typical engineering tradeoff between computing resources (e.g., processing and memory) and performance on an application (e.g., percent correct word sense disambiguation).; My research focused on 3 key questions: (1) How does an NLP engineer build a large lexicon that contains semantic information? (2) How does an NLP engineer choose appropriate semantic information for selectional restrictions? (3) How does an NLP engineer automate the acquisition of selectional restrictions?; I implemented a program to automatically convert and merge knowledge from WordNet, Semcor and CELEX into a lexicon that can be used by an efficient NLP system called Register Vector Grammar (RVG). I proposed a general process for choosing appropriate semantic information for selectional restrictions and demonstrated the application of the process to domain independent text. I developed a system to learn selectional restrictions from samples of text corpora. My approach built on the work of Resnik and others by taking into account the sense of the verb predicate and the noun argument. I applied a novel procedure for testing selectional restrictions for over-restriction and under-restriction. Experiments were performed to refine the lexicon learning process and test the resulting selectional restrictions on new sentences that were not used in the learning process. Results show that the learned selectional restrictions: (1) show a reasonable engineering tradeoff between over-restriction and under-restriction. (2) show a level of performance on word sense disambiguation that is close to state-of-the-art techniques which are not yet implemented in an efficient NLP system.
Keywords/Search Tags:Domain independent text, NLP, Selectional restrictions, System, Process, Word, Semantic, Engineering
Related items