Font Size: a A A

Chinese Word Meaning Elimination Qi

Posted on:2010-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:B DengFull Text:PDF
GTID:2208330332976555Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
WSD(Word Sense Disambiguation) is always a key problem in the field of NLP(Natural Language Processing). The study of WSD is the foundation of NLP research. It can be applied in many NLP systems, such as Machine Translation, Text Categorization, Automatic Summarization, Information Searches, Text Mining, Speech Recognition, Text-to-Speech, and so on. Therefore, the study of WSD is the important theoretical and practical significance for NLP. In this pOaper some techniques of Chinese WSD has been discussed, namely how to avoid the oversize scope of context in the NBM(Naive Bayesian Mode)l and how to dissimilate the influence of context for ambiguous words. This dissertation has the following innovation.(1) Constructing a training corpus by manual tagging, which contained about 5000 sentences and 150000 words. We selected sentences from the "People's Daily" corpus. Everyone included about 30 words and the context of ambiguous words exceed 10 words. The ambiguous words senses are come from the HowNet.(2) Proposing the WSD based on Bayesian Model and Information Gain. This method adopted determine the incidence and weight value of context, limited the incidence and dissimilated the influence degree of context in the NBM. Experiments show that the average precision of this method is 94.39% in close test and 87.13% in open test.(3) Proposing Improved the Bayesian Model of WSD combine Information Gain with the Grammatical Knowledge-base. This method used the Grammatical Knowledge-base of Contemporary Chinese as knowledge source. For no or low-frequency ambiguous words in the corpus, the knowledge of disambiguation come from the attribute property in the Grammatical Knowledge-base, is a method which combined the statistics with the rule. Experiments show that the average precision of this method is 95.13% in close test and 90.87% in open test.(4) According to the above results of research, NBM of WSD Prototype System, the WSD based on Bayesian Model and Information Gain Prototype System, Improved the Bayesian Model of WSD combine Information Gain with the Grammatical Knowledge-base Prototype System have been designed and realized.
Keywords/Search Tags:Word sense disambiguation, HowNet, Bayesian Model, Information Gain, The Grammatical Knowledge-base of Contemporary Chinese
PDF Full Text Request
Related items