Font Size: a A A

Linguistic indicators for language understanding: Using machine learning methods to combine corpus-based indicators for aspectual classification of clauses

Posted on:1999-11-02Degree:Ph.DType:Thesis
University:Columbia UniversityCandidate:Siegel, Eric VictorFull Text:PDF
GTID:2468390014968962Subject:Computer Science
Abstract/Summary:
Linguistics as a field has provided enormous insights that describe how the thoughts behind language are reflected by the structure of sentences. For example, one writes a paper in one week, but rides a bicycle for one hour. This illustrates how prepositions (in and for) correspond to the type of event. Specifically, in modifies a completed process, while for modifies an ongoing process. The area explored by this thesis is, how can we best put our understanding of linguistics to use in order to tap into the vast knowledge encoded in texts?; The ability to distinguish stative clauses, e.g., "She resembles her mother" from event clauses, e.g., "She ran down the street," is a fundamental component of natural language understanding. These two high-level categories correspond to primitive distinctions in many domains, including, for example, the distinctions between diagnosis and procedure in the medical domain. Stativity is the first of three high-level distinctions that compose the aspectual class of a clause. These distinctions in meaning have been well motivated by work in linguistics and natural language understanding.; Aspectual classification is a necessary component for applications that perform certain natural language interpretation, natural language generation, summarization, information retrieval, and machine translation tasks. This is because each of these applications requires the ability to reason about time.; In this thesis, I develop a system to perform aspectual classification with linguistically-based, numerical indicators. These linguistic indicators make use of an array of aspectual markers, each of which has an associated constraint on aspectual class. For example, only clauses that describe an event can appear with the progressive marker, e.g., "I was eating breakfast." Therefore, the category of a verb or phrase is reflected by a numerical indicator that measures how often it occurs in the progressive. The values for such linguistic indicators are computed automatically across corpora of text. We develop and evaluate fourteen indicators over unrestricted sets of verbs occurring across two corpora. Our analysis reveals a predictive value for several indicators that have not previously been conjectured to correlate with aspect in the linguistics literature.; Then, machine learning is used to combine multiple indicators in order to improve classification performance. The models automatically derived by learning are manually examined, revealing several linguistic insights regarding the indicators and their interactions. Three machine learning techniques are compared for this task: decision tree induction, a genetic algorithm, and log-linear regression.; We conclude that linguistic indicators successfully exploit linguistic insights to provide a much-needed method for aspectual classification. Future work will extend this approach to other semantic distinctions in natural language.
Keywords/Search Tags:Language, Aspectual classification, Linguistic, Indicators, Machine learning, Distinctions, Clauses
Related items