Font Size: a A A

Research On A Chinese Word Sense Disambiguation

Posted on:2010-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:H H DanFull Text:PDF
GTID:2178360275974346Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Word Sense Disambiguation(WSD) is an important research project in computational linguistics and natural language processing(NLP), and is also one of hot spot research problems in NLP in recent years. The results of WSD affects the results of machine translation,information retrieval,sentence analysis, speech recognition and so on directly. The research on WSD has great theoretical and practical significance.There are many WSD methods, but now the WSD methods based on knowledge base are relatively small. For the different from the statistical method of disambiguation, it shows superior performance of Corpus-based learning guide in text at an open area. Ontology is a clear and formal specification on sharing conceptual model, and describes in the conceptualization of the world's knowledge, and describes the world using the concept and the relationship between the concepts. So ontology can be used as the world of knowledge base of the natural language processing systems. In recent years, research on ontology is rapid development, involving many aspects such as ontology content, ontology expression, construction rules, ontology automatic construction and etc. With the rising of many more mature onto1ogy, application of ontology receives more and more attention.In this paper, it along the direction of disambiguation based on knowledge base. And this article carry out the work of disambiguation focusing on the structural characteristics of Chinese ontology knowledge base (HowNet) and context information. And by calculating the terms of the relevance and similarity to the realization of this article Word Sense Disambiguation. Specific research focus on the following aspects:This stage, a lot of WSD studies use a few of representative ambiguity words to the object of research and testing. There are some limitations in practical applications. So this paper aimed at the real application of the text, do WSD research for the large-scale article. The paper pointed out that after we do the word segmentation and word-character tagging for arbitrary text, and then use the polysemy thesaurus commonly used to identify ambiguous words. So we solve problems in the real application.At the same time, ontology as a knowledge base of WSD, can avoid the complex process above of obtaining word meaning on the artificial training corpus, and can provide an accurate meaning.When we extract characteristic words of the ambiguous word in the use of a certain size window on context, we calculate correlation between the ambiguous words and context-related words based on the idea of three times mutual information. This method distinguishes between the high-frequency words and low-frequency words effectively, and in accordance with the Correlation degrees size, the characteristics words that have a large amount of information with the ambiguity of word can be extract.According to the structure of ontology (In this paper, the structure is above of Chinese HowNet), and the relationships between the concepts, this paper proposed use of the similarity calculations to determine the precise meaning of particular ambiguous word in the context, so as to achieve purpose of WSD.Experiments show that this WSD accuracy have greatly improved than previous methods. Paper further describes the method of the WSD is feasible and efficient.
Keywords/Search Tags:Word Sense Disambiguation, Ontology, Hownet, Context, Correlation calculate, Similarity calculate
PDF Full Text Request
Related items