Font Size: a A A

Maximum entropy model for Korean word sense disambiguation

Posted on:2010-06-03Degree:M.SType:Thesis
University:University of Colorado at BoulderCandidate:Shin, DonghunFull Text:PDF
GTID:2448390002481478Subject:Computer Science
Abstract/Summary:
Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible senses, and plays an important role in many natural language processing applications such as machine translation, document classification, and information retrieval. Although many researchers have studied the language ambiguity and suggested various methodologies, word sense disambiguation still remains a great challenge.;Corpus-based word sense disambiguation is the statistical or empirical approach, which uses machine learning algorithm to induce the correct classification model in deciding the correct sense in the given context. It has generally shown state-of-the-art performance in experimental works and evaluation exercises such as SENSEVAL [Snyder and Palmer, 2004]. As in other machine learning-based systems, the performance depends crucially on the size and the quality of the training set and thus Korean words sense disambiguation has been unsuccessful until recently due to the lack of widely available semantically annotated Korean corpus.;In this thesis, the supervised corpus-based method of Korean word sense disambiguation is presented. It is based on the Conditional Maximum Entropy Model and the recently released Sejong Morph-Sense Tagged corpus1 was used for training. The system presented in this work achieved 81.6% precision, which is 12% better than the baseline system, on ten pre-selected ambiguous Korean words.;1Part of Korean National Corpus, constructed by the Ministry of Culture and Tourism of Korea since 1998.
Keywords/Search Tags:Sense disambiguation, Korean, Model
Related items