Font Size: a A A

Research Of Word Sense Disambiguation Based On Hybird Features And Rules

Posted on:2015-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:N N GaoFull Text:PDF
GTID:2268330428990977Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Word sense disambiguation is a middle task in the field of natural languageprocessing, the accuracy of which affects machine translation and informationretrieval carried out smoothly. It has important theoretical and practical significance.A lot of word sense disambiguation techniques had been proposed currently.Supervised disambiguation methods have obtained best results, but these methodsrequire a large manually sense-tagged corpus to construct a classifier for eachambiguous word. The scale of existing sense-tagged corpora is far from sufficient totrain a practical disambiguation system. Unsupervised disambiguation methods areexpected to overcome the knowledge acquisition bottleneck. However, the precisionof unsupervised methods is needed to be improved.In this paper, on the basis of in-depth study of the existing literature, wepropose a new hybrid features extraction method of context for problems of existingword sense disambiguation system. This method simulated the way of human judgeambiguous word in the process of reading the article. The study found thatdiscriminating ambiguous word of human is a process of applying knowledge frompoint to surface and from near to far. In order to take full advantage of context andextract knowledge comprehensively, this method uses context neighbor features,local features and global features to represent different levels of knowledge.On the basis of hybrid features extraction method of context, WordNet andWordNet Domains, combined with the rule-based classification techniques, thispaper presents word sense disambiguation based on hybrid features and rules. Thismethod uses different strategies to calculate the gloss similarity, domain similarity and topic similarity between each sense of ambiguous word and its context. Usingthe gloss similarity, domain similarity and topic similarity, we extract rules ofdisambiguation through RIPPER method and construct a uniform classifier for allambiguous words and complete word sense disambiguation.To verify the effectiveness of word sense disambiguation method proposed inthis paper, we adopt the Senseval-3English all words task of the internationalstandard data set and use the cross-validation technique for the evaluation.Experimental results demonstrate the effectiveness of our disambiguation method.Word sense disambiguation based on hybrid features and rules can avoid problem insupervised disambiguation methods caused by manually sense-tagged corpus andimproved the precision of word sense disambiguation.In the process of word sense disambiguation study, we find that there are someproblems in the area of word sense disambiguation still needed to solve, such aspartition granularity of sense in the knowledge source WordNet is too fine,disambiguation rules extraction can be further optimized. At the end of this paper,we indicate the direction for further research.
Keywords/Search Tags:Word sense disambiguation, Feature extraction, Semantic similarity, Ruleearning
PDF Full Text Request
Related items