Font Size: a A A

Toward language-independent morphological segmentation and part-of-speech induction

Posted on:2008-04-01Degree:M.SType:Thesis
University:The University of Texas at DallasCandidate:Dasgupta, SajibFull Text:PDF
GTID:2448390005455157Subject:Computer Science
Abstract/Summary:
This thesis addresses two fundamental tasks in natural language processing, namely morphological segmentation and part-of-speech induction. In contrast to existing algorithms developed for these problems, we have proposed a learning system where a morphological analyzer and a part-of-speech lexicon can be built automatically from just a text corpus without using any additional language specific grammatical knowledge. We also give empirical support that our system is totally language independent, i.e., it can be extended to many different languages. In fact, our morphological segmentation algorithm outperforms Goldsmith's Linguistica and Creutz and Lagus's Morfessor for English and Bengali, and achieves performance that is comparable to the best results for all three PASCAL evaluation datasets on English, Finnish and Turkish. Our unsupervised part-of-speech acquisition system differs from existing bootstrapping algorithms developed for this problem in that it more tightly integrates morphological information with the distributional POS induction framework and adjusts well to languages where distributional features are not reliable enough. Experimental results demonstrate that our approach works well for English and Bengali, thus providing suggestive evidence that it is applicable to both morphologically impoverished languages and highly inflectional languages.
Keywords/Search Tags:Morphological, Language, Part-of-speech
Related items