Font Size: a A A

Research On Chinese Word Sense Disambiguation

Posted on:2008-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:M ShangFull Text:PDF
GTID:2178360242467599Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word Sense Disambiguation (WSD) is the kenel problem of nature language processing, which result affects the results of machine translation, information retrieval, sentence analysis, speech recognition and so on directly. The research on WSD has great theoretical and practical significance. The main task of this thesis is to study effective statistical WSD methods. Firstly, this thesis studies supervised learning algorithm based on sense-tagged corpus. Secondly, this thesis studies the minimally supervised learning algorithm.An approach based on supervised DR-AdaBoost learning algorithm for WSD is presented. DR-AdaBoost algorithm is employed to learn WSD knowledge to boost the precision of the weak stumps rules and the subordinate weak stumps rules and repeatedly calls a learner to finally produce a more accurate rule. DR-AdaBoost algorithm is AdaBoost algorithm when the weight of the subordinate weak stumps rules is zero. Experimental results show that the precision of DR-AdaBoost increased 2.61% than AdaBoost algorithm.Although supervised learning algorithm has good results for WSD, the results strongly depend on the amount of sense-tagged corpus. This thesis proposes a Bootstrapping algorithm, which uses small sense-tagged corpus and untagged corpus synthetically. This algorithm reduces the large sense-tagged corpus need which is very important to supervised algorithm. Experimental results show Bootstrapping algorithm outperforms Bayes classifer with the same sense-tagged corpus.In the Bootstrapping algorithm, a grouping strategy is proposed to select the most confident examples. The newly labeled examples are arranged in the order of the number of their features appearing in the traing set and the examples with the same or closed feature number are categorized to the same group, then compare the examples in each group and select the examples with higher probability. Experimental results show precision based on Bootstrapping algorithm with grouping strategy has improved greatly than that based on basic classifier, 3.5 percentage points increased for six ambiguous words.
Keywords/Search Tags:Word Sense Disambiguation, DR-AdaBoost Algorithm, Bootstrapping Algorithm, Grouping Strategy
PDF Full Text Request
Related items