Engineering lexical semantics for natural language processing systems

Posted on:1998-07-25

Degree:Ph.D

Type:Dissertation

University:Lehigh University

Candidate:Kogut, Paul Anthony

Full Text:PDF

GTID:1468390014478728

Subject:Computer Science

Abstract/Summary:

Educated adults apply a vocabulary of at least 100,000 words when they read a domain independent text such as a newspaper. Building a lexicon large enough to handle domain independent text is one of the major engineering problems in Natural Language Processing (NLP). Generating the semantic information for a lexicon, including selectional restrictions on the subjects and objects of verbs, is especially difficult because the information is not readily available from a single source such as a machine readable dictionary or sample text in a corpus. Selectional restrictions are important for domain independent text because they can help disambiguate frequently occurring words which tend to have many word senses. Generating a lexicon with semantics involves a typical engineering tradeoff between computing resources (e.g., processing and memory) and performance on an application (e.g., percent correct word sense disambiguation).; My research focused on 3 key questions: (1) How does an NLP engineer build a large lexicon that contains semantic information? (2) How does an NLP engineer choose appropriate semantic information for selectional restrictions? (3) How does an NLP engineer automate the acquisition of selectional restrictions?; I implemented a program to automatically convert and merge knowledge from WordNet, Semcor and CELEX into a lexicon that can be used by an efficient NLP system called Register Vector Grammar (RVG). I proposed a general process for choosing appropriate semantic information for selectional restrictions and demonstrated the application of the process to domain independent text. I developed a system to learn selectional restrictions from samples of text corpora. My approach built on the work of Resnik and others by taking into account the sense of the verb predicate and the noun argument. I applied a novel procedure for testing selectional restrictions for over-restriction and under-restriction. Experiments were performed to refine the lexicon learning process and test the resulting selectional restrictions on new sentences that were not used in the learning process. Results show that the learned selectional restrictions: (1) show a reasonable engineering tradeoff between over-restriction and under-restriction. (2) show a level of performance on word sense disambiguation that is close to state-of-the-art techniques which are not yet implemented in an efficient NLP system.

Keywords/Search Tags:

Domain independent text, NLP, Selectional restrictions, System, Process, Word, Semantic, Engineering

Related items

1	A practical semantic representation for natural language parsing
2	Word Similarity Computing And Its Application In Selectional Preference Acquisition
3	Acquisition Of Chinese Verb Selectional Preference Information
4	Research On The Automatic Acquisition Of Preferred Semantic Classes In Chinese
5	Research Of Chinese Text Preprocessing Based On Semantic
6	A resource independent process representation for enterprise-based engineering integration
7	Research On Similarity Computing Method For Domain Texts
8	Study On Method To Automatically Analyze The Text Structure Based On The Relevancy Computing Of Text Content
9	Research On The Key Technologies Of Text Sentiment Information Extraction
10	Research Of Word Semantic Similarity Based On Domain Knowledge