Font Size: a A A

The Research On Chinese Word Sense Disambiguation Based On Corpus

Posted on:2006-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:C Q QuanFull Text:PDF
GTID:2168360152495234Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The research on Word Sense Disambiguation (WSD) has great theoretical and practical significance in many areas of Natural Language Processing (NLP). It is an "intermediate task" for many NLP applications such as machine translation, information retrieval, etc.. The main work in this thesis is to study WSD algorithms which learning WSD knowledge from corpus. The research work of this thesis is as follows.1. The research methods about WSD are described, including the knowledge resources and the methods classification. The WSD algorithms that learn knowledge from corpus are analyzed and compared in detail.2. A word sense representation method based on multi-classifier decision is put forward. Labeled corpus and unlabeled corpus are combined to construct disambiguation classifiers. This method reduces the need for large scale labeled corpus so that extend the application of supervised WSD methods.3. In order to relieve the two main disadvantages in supervised WSD methods: the heavy labor on labeling corpus by hand and the problem of data sparse, we select word sense indicators as main knowledge. Word sense indicators are used to express the combination relationships between words. A statistical machine-learning algorithm (Word Sense indicatiors obtaining method based on selecting the best seeds) is put forward to acquire effectively word sense indicators that represent each sense of polysmous word respectively from corpus. The subjectivity caused by selecting the initial knowledge by hand and the bottle problem of acquiring knowledge can be relieved to a certain extent.4. Based on word sense indicators, we design a half- supervised WSD methods based on corpus. The heavy labor on labeling corpus by hand and the problem of data sparse can be relieved effectively. And the influences on WSD caused by word sense indicators are analyzed, which provides credible reference for Chinese WSD with multi-feature.
Keywords/Search Tags:Natural Language Processing, Word Sense Disambiguation, Corpus, Supervised method, Half-supervised method
PDF Full Text Request
Related items