Font Size: a A A

Research On Chinese Word Sense Disambiguation Based On Semantic Analysis

Posted on:2007-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:R YanFull Text:PDF
GTID:2178360182994721Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word Sense Disambiguation (WSD) plays an important role in many areas of Natural Language Processing (NLP), and now it has become a hotspot and nodus. As an intermediate task, the research on WSD has great theoretical and practical significance in NLP. WSD is not an end but a prerequisite for many NLP tasks. It is helpful for machine translation (MT), information retrieval (IR), syntactic parings, speech synthesis and so on. What is more, it has great theory and reality significance on realizing and grasping the actuality and developing trends.The main work in this dissertation is to study how to acquire the knowledge that is supporting WSD from different language resources. At the same time, a WSD system about Chinese real text has been built in this dissertation. The main work and innovative results of the dissertation is organized as follows:Firstly, introduce the research status of the overseas and the domestic. Also explain the object and the goal of the work. The dissertation is mainly disambiguating Chinese real text sense.Secondly, analyze the classes and characters of Chinese polysemous words in detail. A thorough investigation on the effect of semantic knowledge in WSD has been made in this dissertation. To acquire the knowledge of WSD is the key of the work. Two machine-readable dictionary—《HowNet》 and 《XianDai HanYu CiHai》 are used as the semantic resources. Also, some knowledge databases have been designed, such as dynamic preference combination library, word library (include multi-word library and single-word library) and filter library.Thirdly, design and implement the WSD system. A WSD model has been given in this section. There are five modules which are applied to disambiguate word sense in the WSD system: pretreatment, similarity calculation, relevance calculation, middle-WSD and knowledge database management. The module of pretreatment is lined out the polysemous words through part-of-speech (POS) tagging and POSfiltering. It eliminates fake-polysemous, and partly disambiguates the Chinese ambiguity words sense. Middle-WSD is the core of the whole system. It is mainly divided into two parts: similarity calculation and relevance calculation. However, similarity calculation is based on the up-down relation among the semdicts which are used to define the concepts in HowNet. Otherwise, another four relations among those have been extracted, which are used to the relevance calculation. Following, some formulas have been made. Not only can they calculate the relevance between the words, but the relevance between the word and the context. Knowledge database management is to query and update the datum in three knowledge databases. In the implementation of system, also an example has been given to explain and validate the process of WSD.Lastly, give an evaluation of our WSD system. News corpora from {China daily} in 1998 has been used in the experimentation, including politics, economy, science and agriculture. The experiment results have been proved that the model is effective in task of WSD and the precision can reach about 83%.
Keywords/Search Tags:Natural Language Processing (NLP), Word Sense Disambiguation (WSD), Similarity, Relevance, Combination, Semantic Analysis
PDF Full Text Request
Related items