Font Size: a A A

Cross-lingual Link Discovery Based On Wiki

Posted on:2015-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:J X ZhengFull Text:PDF
GTID:2298330467967032Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Wikipedia is an online open multilingual knowledge base which is collaboratively edited byusers around the world. It covers rich semantic information, especially for the anchor-to-links,so that the user can further acquire knowledge through the link. In fact, there is acomplementary relationship between two Wiki entries in different languages. To fullyunderstand the content of Wiki entry, users usually need to acquire knowledge by across-lingual method. However, this relationship generally exists only in monolingualWikipedia articles while Cross-Lingual Links also exist only between a few of the articletitles.Therefore, people proposed a new method which is called as Cross-Lingual Language LinkDiscovery (CLLD). It is mainly concerned with automatically identifying the anchor textrelated to the topic of Wikipedia articles in the source language, and recommending a series ofrelated target language links for the anchor text. It plays an important role in breaking thebarrier of language and sharing with knowledge.CLLD involves three key problems: anchor text identification, anchor text translation, andtarget link discovery. In this thesis, we mainly study for anchor text translation and target linkdiscovery.In the anchor text translation, an anchor text may have multiple target translations. If thetranslation selection of anchor text is incorrect, it will directly have an effect on the accuracyof link recommendation in target link discovery. Therefore, this thesis proposed acontext-based anchor text translation selection method. In contrary to previous methods, theproposed method considered the context information of anchor text, and used a voting methodbased on point-wise mutual information (PMI) to determine the translation of anchor text. Itwas tested on the translation selection of person names, terminology and abbreviation inChinese Wikipedia articles and English ones, and the experimentation shows that the methodachieves good performances.In previous target link discovery, people cannot directly get the definition of an anchor text, this thesis tried to use Anchor-to-BEPs and definitional identification technologies to studyfor definitional link discovery. The experimentation shows that the method based on thesyntactic template is more effective than the method based on the rule template in identifyingthe definitional link.Lastly, this thesis designed and implemented a CLLD system which was based on ChineseWikipedia articles and English ones. The system was evaluated by NTCIR Chinese-to-EnglishCLLD. The evaluation shows that the performance of our system achieves the highest LMAPand R-prec values under Anchor-to-File with Manual Assessment.
Keywords/Search Tags:Wikipedia, CLLD, anchor text translation, Anchor-to-BEPs, definitional link
PDF Full Text Request
Related items