Font Size: a A A

Automatic discrimination of genres: The role of adjectives and adverbs as suggested by linguistics and psychology

Posted on:2008-02-03Degree:Ph.DType:Dissertation
University:Rutgers The State University of New Jersey - New BrunswickCandidate:Rittman, Robert JohnFull Text:PDF
GTID:1445390005451096Subject:Information Science
Abstract/Summary:
Traditionally, information retrieval systems have been designed and evaluated based on the assumption that information is relevant if it is about the topic of the expressed information need. The tremendous growth in the scope and size of information resources, particularly in the World Wide Web, has motivated research on selecting information using non-topical criteria. Non-topical criteria, such as genre, have been recognized for many years. Genre is a socially constructed classification of texts based on external criteria of use, such as academic writing versus news reportage versus fiction.; The proximate goal of this exploratory research is to support information users when non-topical criteria are important. Specifically, we assess the feasibility of automatically classifying documents by genre using only adjectives and adverbs as discriminating features. We distinguish our work by our choice of features and our motivation for choosing these features. In contrast to ad hoc methods that use pure machine learning, or pure statistical approaches, we base our a priori selection of features on an understanding of the role of adjectives and adverbs in language using insights from the diverse fields of linguistics and psychology. Our systematic study of adjectives and adverbs results in more than 300 genre discrimination tests.; Our findings demonstrate that we can build more parsimonious classifiers whose power is equal to or greater than that of classifiers which use statistical or machine learning rules to select features ad hoc from much larger and more diverse sets of features. We find that representing documents as vectors of the frequency of fewer than 20 adverbs, known as speaker-oriented adverbs, is effective for discriminating documents by genre. In some cases, performance improves more than 90% over the most rigorous baseline using a measure we call accuracy gain.; More generally, we (a) present a model for the systematic study of many complex features, (b) demonstrate the principle of selecting features motivated from other disciplines, and (c) calculate a new performance measure (accuracy gain) that compares results across studies. These contributions can be generalized to other classification problems, including tasks defined by other scientific perspectives and disciplines.
Keywords/Search Tags:Adjectives and adverbs, Genre, Information, Features
Related items