Font Size: a A A

Modeling and learning multilingual inflectional morphology in a minimally supervised framework

Posted on:2004-08-14Degree:Ph.DType:Thesis
University:The Johns Hopkins UniversityCandidate:Wicentowski, Richard HowardFull Text:PDF
GTID:2458390011957612Subject:Computer Science
Abstract/Summary:
Computational morphology is an important component of most natural language processing tasks including machine translation, information retrieval, word-sense disambiguation, parsing, and text generation. Morphological analysis, the process of finding a root form and part-of-speech of an inflected word form, and its inverse, morphological generation, can provide fine-grained part of speech information and help resolve necessary syntactic agreements. In addition, morphological analysis can reduce the problem of data sparseness through dimensionality reduction.; This thesis presents a successful original paradigm for both morphological analysis and generation by treating both tasks in a competitive linkage model based on a combination of diverse inflection-root similarity measures. Previous approaches to the machine learning of morphology have been essentially limited to string-based transduction models. In contrast, the work presented here integrates both several new noise-robust, trie-based supervised methods for learning these transductions, and also a suite of unsupervised alignment models based on weighted Levenshtein distance, position-weighted contextual similarity, and several models of distributional similarity including expected relative frequency. Via iterative bootstrapping the combination of these models yields a full lemmatization analysis competitive with fully supervised approaches but without any direct supervision. In addition, this thesis also presents an original translingual projection model for morphology induction, where previously learned morphological analyses in a second language can be robustly projected via bilingual corpora to yield successful analyses in the new target language without any monolingual supervision.; Collectively these methods outperform previously published algorithms for the machine learning of morphology in several languages, and have been applied to a large representative subset of the world's language's families, demonstrating the effectiveness of this new paradigm for both supervised and unsupervised multilingual computational morphology.
Keywords/Search Tags:Morphology, Supervised, Language
Related items