Font Size: a A A

A Chinese Unsupervised Word Sense Disambiguation Method Based On Semantic Vector

Posted on:2013-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:L CuiFull Text:PDF
GTID:2298330362964414Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The problem of word sense disambiguation is an important research topic in the fields ofcomputational Linguistics and natural language processing, it has important theory andpractice significance in many applications fields. The word sense disambiguation methodswith high accuracy and good practicality will improve the effects in machine translation, textclassification, automatic summarization, information retrieval and text mining.The supervised machine learning word sense disambiguation method need to annotate thewords in the training corpus, in order to overcome the data sparseness problem to achieve thegood effect of disambiguation, it must establish a large-scale marked corpus, obtaining themarked corpus requires to pay the high artificial price. To solve this problem this paperproposes an unsupervised learning methods without manual annotation, the word sensedisambiguation method based on the unsupervised machine learning does not require toannotate the semantic categories for each word in the training samples, so it does not need thesupport of annotating the corpus manually, the method will effectively overcome the datasparseness problem.This paper combines PMI and the Z test to select the feature words within three wordsaround the context of the polysemy, uses the sense words to definite certain sense describingthe polysemy, learns from the idea of computing natural language query and document in thetraditional information retrieval, lets the context of polysemy as a query in informationretrieval, lets the sense words as the documents in information retrieval. Constructing thesemantic vector and the feature words vector for each polysemy, and then calculating thesimilarity between the semantic vector and the query vector for each polysemy to determinethe correct meaning of the polysemy. This paper uses150typical polysemy as the experimentdata, The experimental results show that this method is validity.
Keywords/Search Tags:word sense disambiguation, unsupervised learning, PMI, similarity
PDF Full Text Request
Related items