Chinese Word Sense Disambiguation With AdaBoost.MH Algorithm

Posted on:2007-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:F C Liu

Full Text:PDF

GTID:2178360182960718

Subject:Computer software and theory

Abstract/Summary:

Word sense disambiguation (WSD) plays an important role in many areas of natural language processing such as machine translation, information retrieval, sentence analysis, speech recognition. The research on WSD has great theoretical and practical significance. The main work in the dissertation is to study the supervised learning algorithm learning WSD knowledge from many kinds of resources based on large sense-tagged Chinese corpus.An approach based on supervised AdaBoost.MH learning algorithm for Chinese word sense disambiguation is presented. AdaBoost.MH algorithm is employed to learn WSD knowledge from many kinds of resources and to boost the accuracy of the weak stumps rules for decision trees and repeatedly calls a learner to finally produce a more accurate rule. A simple stopping criterion is also presented in view of the efficiency of learning and the utility of system.In contrast experiment between AdaBoost.MH algorithm and Naive Bayes algorithm, the former has a higher learning capability. For the open tests' accuracy rates in SENSEVAL3 Chinese corpus, the former outdoes 8 percentage points compared to the latter.As for Chinese WSD, in order to extract more contextual information, this paper introduces a new WSD knowledge which is semantic categorization as well as two classical knowledge sources: part-of-speech of neighboring words and local collocations. Experimental results show that the semantic categorization knowledge is useful for improving the learning efficiency of the algorithm and accuracy of disambiguation.AdaBoost.MH algorithm has a higher disambiguation accuracy rates in open tests which are 85.75% for 6 typical polysemous Chinese words and 75.84% for 20 polysemous words from SENSEVAL3 Chinese corpus.Due to the flexibility and complexity of building up a broad coverage semantically annotated corpus, an approach based on WWW search engines to automatically obtain annotated corpus for Chinese WSD is presented. Experimental results show that the approach is feasible.

Keywords/Search Tags:

Natural Language Processing, Word Sense Disambiguation, AdaBoost.MH Algorithm, Multiple Knowledge Sources

Related items

1	Research On Word Sense Disambiguation Based On The Strategy Of Field Priority Selection
2	Word Sense Disambiguation Corpus Automatic Acquisition
3	Based On Semi-supervised Method Of Chinese Word Sense Disambiguation
4	Chinese Word Sense Disambiguation Based On Parsing Tree
5	Research And Application Of Word Sense Disambiguation Method Based On Contextual Semantic
6	Research On Word-level Ambiguity Resolution Method
7	Chinese Word Sense Disambiguation Based On Semantic
8	Research On Word Sense Disambiguation Method Based On Word Embedding
9	An Approach For Word Sense Disambiguation Based On WordNet
10	Research On Chinese Word Sense Disambiguation Model Based On Bidirectional Recurrent Neural Network