Font Size: a A A

The Research Of Several Key Technologies Of Word Sense Disambiguation

Posted on:2013-01-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:J P ChenFull Text:PDF
GTID:1228330395475957Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word sense disambiguation (WSD), the task of identifying the intended sense of a word in context in a computational manner, is the most difficult problem in the lexical level of nature language processing. It is an important technique for many NLP applications requiring broad-coverage language understanding, such as knowledge acquisition, text mining, text summarization, machine translation, information retrieval, etc. It has become an increasingly urgent problem for improving the performance of word sense disambiguation. In this dissertation, to resolve the problem, some key technologies of word sense disambiguation are investigated deeply. The main contents of this dissertation include the following parts:1. To exploit the knowledge contained in the knowledge resources, especially the heterogeneous knowledge resources, an unsupervised WSD method based on lexical stability and improved lexical chain is proposed. This method firstly constructs the semantic relationship graph which can represent the all semantic relations contained in the document. Then, based on the semantic relationship graph, the method can disambiguate the words. The semantic relationship graph can integrate the heterogeneous knowledge resources, and so the method can effectively exploit the knowledge contained in the knowledge resources. In addition, a word sense weighting scheme, namely lexical stability, is proposed to measure word sense reliability. This weighting scheme can efficiently relief the noises introduced from unreliable word sense.2. The word sense disambiguation systems heavily rely on knowledge. However, the knowledge for WSD is hardly construct and not enough to support the high performance word sense disambiguation systems. So, in this dissertation, we first proposed a novel method for the automatic disambiguation of a large-scale common sense knowledge base, namely ConceptNet. Then we used the disambiguated ConceptNet to enrichment WordNet. Our experiments show that enriching WordNet with the disambiguated ConceptNet can significantly improve the performance of knowledge-based WSD methods.3. Word sense dictionary is a precondition for word sense disambiguation. Without a good word sense dictionary, WSD systems cannot achieve high performance. However, for WSD in special fields, such as tags disambiguation in social tagging systems, the word sense dictionaries predefined by experts cannot effectively cover the senses of ambiguous words. So we present a model based on non-negative matrix factorization to induce word sense dictionary from social tagging systems. The word sense dictionary constructed automatically can effectively cover the senses of ambiguous tags in social tagging systems, and disambiguate them. In additionally, we propose an automatic evaluation method to measure the performance of the word sense dictionary induced from social tagging systems, which can avoid the laborious and erroneous human evaluations.
Keywords/Search Tags:word sense disambiguation, lexical chain, lexical stability, ConceptNet, non-negative matrix factorization, social tagging system
PDF Full Text Request
Related items