Font Size: a A A

Portable language technology: A resource-light approach to morpho-syntactic tagging

Posted on:2007-05-22Degree:Ph.DType:Thesis
University:The Ohio State UniversityCandidate:Feldman, AnnaFull Text:PDF
GTID:2448390005476894Subject:Language
Abstract/Summary:
Morpho-syntactic tagging is the process of assigning part of speech (POS), case, number, gender, and other morphological information to each word in a corpus. Morpho-syntactic tagging is an important step in natural language processing. Corpora that have been morphologically tagged are very useful both for linguistic research, e.g. finding instances or frequencies of particular constructions in large corpora, and for further computational processing, such as syntactic parsing, speech recognition, stemming, and word-sense disambiguation, among others. Despite the importance of morphological tagging, there are many languages that lack annotated resources. This is almost inevitable because these resources are costly to create. But, as described in this thesis, it is possible to avoid this expense.;This thesis describes a method for transferring annotation from a morphologically annotated corpus of a source language to a corpus of a related target language. Unlike unsupervised approaches that do not require annotated data at all and, as a consequence, lack precision, the approach proposed in this dissertation relies on linguistic knowledge, but avoids large-scale grammar engineering. The approach needs neither a parallel corpus nor a bilingual lexicon, and requires much less linguistic labor than the standard technology.;This dissertation describes experiments with Russian, Czech, Polish, Spanish, Portuguese, and Catalan. However, the general method proposed can be applied to any fusional language.
Keywords/Search Tags:Language, Tagging, Approach
Related items