Font Size: a A A

Construction Of Knowledge Base Disambiguation Knowledge Base Based On Multi - Knowledge Source

Posted on:2013-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:L JuFull Text:PDF
GTID:2208330362466055Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word Sense Disambiguation(WSD) is an important part in Natural LanguageProcessing(NLP).It affects how computer understand and interpret language.Now thereare many ways to disambiguate word sense,such as probability models and rules.Thosemethods make computer act and understand language like human beings.WSDknowledge can be obtained from dictionaries and corpus by various methods.So it’snecessary to integrate these knowledge and build knowledge bases specialized in WSDwork.According to knowledge sources, such as The Grammatical Knowledge-base ofContemporary Chinese(GKB), The Semantic Knowledge-base of ContemporaryChinese(SKCC or CSD),The Word-Sense Tagging Corpus (STC) and HowNet, threeknowledge bases were built by taking innovation and improvement on the existingdisambiguation methods. The three knowledge bases are CRF Model Base,Scene WordBase and Distinguish Attributes Base. They can be used in the WSD experiments.Thespecific works are as following aspects:1.Established Short Sentence Base using STC and store sentences in different filesaccording to the polysemous word it contains. The corpus provided a foundation forbuilding CRF model base, constructing rules and doing disambiguation experiments.2.Built CRF Model Base to store model trained by CRF. CRF was used to trainfull text,high frequence polysemous word meaning corpus and low frequencypolysemous word meaning corpus and models were used to test corpus.With severalexperiments,the data showed that model files trained from low frequency polysemousword meaning corpus performed better.Finally a threshold can be determined to help usjudge the result.3.Established Scene Word Base using co-occurrence words,collocation words andkey words. These words can be extracted from corpus by means of word bag,dependentsyntax tree and best seeds. These three kinds of words were introduced to do close test,and impact factors were determined with reference to the results. In the end, the orderdisambiguation and the comprehensive disambiguation were adopted in the open test.4.Constructed Distinguish Attributes Base using knowledge from SKCC andGKB.As both of them were designed by Institute of Computational Linguistics ofPeking University,they can easily be integrated and distinguish attributes can be extracted among different meaning of polysemous word.Besides,the Example field canbe used to designed template combining with HowNet, Short Sentence Base andCorpus Online.,which can be used to disambiguate sparse polysemous word.5.Finally, three systems were designed to extract knowledge and do experimentsto show how these knowledges performed.To sum up,this paper analyzed knowledge from different sources and integratedthem into a whole WSD Knowledge Base.It can not only be used in the WSDwork,but also will promote other relevant NLP work.
Keywords/Search Tags:Word Sense Disambiguation, model base, scene word, definite distinguishattributes, template
PDF Full Text Request
Related items