Linguistic indicators for language understanding: Using machine learning methods to combine corpus-based indicators for aspectual classification of clauses

Posted on:1999-11-02

Degree:Ph.D

Type:Thesis

University:Columbia University

Candidate:Siegel, Eric Victor

Full Text:PDF

GTID:2468390014968962

Subject:Computer Science

Abstract/Summary:

Linguistics as a field has provided enormous insights that describe how the thoughts behind language are reflected by the structure of sentences. For example, one writes a paper in one week, but rides a bicycle for one hour. This illustrates how prepositions (in and for) correspond to the type of event. Specifically, in modifies a completed process, while for modifies an ongoing process. The area explored by this thesis is, how can we best put our understanding of linguistics to use in order to tap into the vast knowledge encoded in texts?; The ability to distinguish stative clauses, e.g., "She resembles her mother" from event clauses, e.g., "She ran down the street," is a fundamental component of natural language understanding. These two high-level categories correspond to primitive distinctions in many domains, including, for example, the distinctions between diagnosis and procedure in the medical domain. Stativity is the first of three high-level distinctions that compose the aspectual class of a clause. These distinctions in meaning have been well motivated by work in linguistics and natural language understanding.; Aspectual classification is a necessary component for applications that perform certain natural language interpretation, natural language generation, summarization, information retrieval, and machine translation tasks. This is because each of these applications requires the ability to reason about time.; In this thesis, I develop a system to perform aspectual classification with linguistically-based, numerical indicators. These linguistic indicators make use of an array of aspectual markers, each of which has an associated constraint on aspectual class. For example, only clauses that describe an event can appear with the progressive marker, e.g., "I was eating breakfast." Therefore, the category of a verb or phrase is reflected by a numerical indicator that measures how often it occurs in the progressive. The values for such linguistic indicators are computed automatically across corpora of text. We develop and evaluate fourteen indicators over unrestricted sets of verbs occurring across two corpora. Our analysis reveals a predictive value for several indicators that have not previously been conjectured to correlate with aspect in the linguistics literature.; Then, machine learning is used to combine multiple indicators in order to improve classification performance. The models automatically derived by learning are manually examined, revealing several linguistic insights regarding the indicators and their interactions. Three machine learning techniques are compared for this task: decision tree induction, a genetic algorithm, and log-linear regression.; We conclude that linguistic indicators successfully exploit linguistic insights to provide a much-needed method for aspectual classification. Future work will extend this approach to other semantic distinctions in natural language.

Keywords/Search Tags:

Language, Aspectual classification, Linguistic, Indicators, Machine learning, Distinctions, Clauses

Related items

1	A LINGUISTIC ANALYSIS OF ALARYNGEAL SPEECH CONTRASTING TWO SPEECH-AUGMENTING DEVICES IN TERMS OF THE ASPECTUAL THEORY OF STRUCTURAL LINGUISTICS
2	The Research On Language Analysis Of Social E-commerce Based On Machine Learning
3	Research On Automatic Evaluation Of Machine Translation Based On Linguistic Knowledge
4	Machine translation: A tool for understanding linguistic challenges facing the second language student
5	Intelligent Device Text Classification Method Based On Natural Language Processing
6	A Software Reconstruction Method Based On Cross-section Characteristics
7	Semantic Role Labelling For Complicated Chinese Clauses
8	An Aspect-level Sentiment Classification Method Based On Linguistic Constraints And Attention Mechanism
9	Research On Text Classification Based On Natural Language Processing And Machine Learning
10	Learning horn-clauses as classification rules for relations