Construction Of Knowledge Base Disambiguation Knowledge Base Based On Multi - Knowledge Source

Posted on:2013-02-05

Degree:Master

Type:Thesis

Country:China

Candidate:L Ju

Full Text:PDF

GTID:2208330362466055

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Word Sense Disambiguation(WSD) is an important part in Natural LanguageProcessing(NLP).It affects how computer understand and interpret language.Now thereare many ways to disambiguate word sense,such as probability models and rules.Thosemethods make computer act and understand language like human beings.WSDknowledge can be obtained from dictionaries and corpus by various methods.So it’snecessary to integrate these knowledge and build knowledge bases specialized in WSDwork.According to knowledge sources, such as The Grammatical Knowledge-base ofContemporary Chinese(GKB), The Semantic Knowledge-base of ContemporaryChinese(SKCC or CSD),The Word-Sense Tagging Corpus (STC) and HowNet, threeknowledge bases were built by taking innovation and improvement on the existingdisambiguation methods. The three knowledge bases are CRF Model Base,Scene WordBase and Distinguish Attributes Base. They can be used in the WSD experiments.Thespecific works are as following aspects:1.Established Short Sentence Base using STC and store sentences in different filesaccording to the polysemous word it contains. The corpus provided a foundation forbuilding CRF model base, constructing rules and doing disambiguation experiments.2.Built CRF Model Base to store model trained by CRF. CRF was used to trainfull text,high frequence polysemous word meaning corpus and low frequencypolysemous word meaning corpus and models were used to test corpus.With severalexperiments,the data showed that model files trained from low frequency polysemousword meaning corpus performed better.Finally a threshold can be determined to help usjudge the result.3.Established Scene Word Base using co-occurrence words,collocation words andkey words. These words can be extracted from corpus by means of word bag,dependentsyntax tree and best seeds. These three kinds of words were introduced to do close test,and impact factors were determined with reference to the results. In the end, the orderdisambiguation and the comprehensive disambiguation were adopted in the open test.4.Constructed Distinguish Attributes Base using knowledge from SKCC andGKB.As both of them were designed by Institute of Computational Linguistics ofPeking University,they can easily be integrated and distinguish attributes can be extracted among different meaning of polysemous word.Besides,the Example field canbe used to designed template combining with HowNet, Short Sentence Base andCorpus Online.,which can be used to disambiguate sparse polysemous word.5.Finally, three systems were designed to extract knowledge and do experimentsto show how these knowledges performed.To sum up,this paper analyzed knowledge from different sources and integratedthem into a whole WSD Knowledge Base.It can not only be used in the WSDwork,but also will promote other relevant NLP work.

Keywords/Search Tags:

Word Sense Disambiguation, model base, scene word, definite distinguishattributes, template

PDF Full Text Request

Related items

1	Research On Chinese Word Sense Disambiguation Method Based On Graph Model
2	Research Of Word Sense Disambiguation Based On Word-sense Category Extending
3	Research On Word Sense Disambiguation Based On The Strategy Of Field Priority Selection
4	Research On Word Sense Disambiguation Method Based On Word Embedding
5	Research On Chinese Word Sense Disambiguation Method Based On Deep Learning
6	Research On Chinese Word Sense Disambiguation Based On Knowledge
7	Research On Word Sense Disambiguation Based On DBN
8	Word Sense Disambiguation Corpus Automatic Acquisition
9	Context Computing Applications, Word Disambiguation
10	Research Of Word Sense Disambiguation Based On Hybird Features And Rules