Font Size: a A A

A Study Of Semantic-Disambiguation Approach On Name Entities

Posted on:2015-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:J J XuFull Text:PDF
GTID:2298330452464017Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Word Sense Disambiguation is a technology to achieve and confrm meaning ofthe words automatically by text distance and context of co-text. Word sense ambigu-ity means polysemy, a word expresses diferent sense under diferent co-text. It alsoknown as ambiguous term. So the aim of word sense disambiguation is to ensure theexact meaning of ambiguous terms in the text. Word sense disambiguation is always afundamental and pivotal research subject in the feld of computational linguistics. Asa semi-fnished task, disambiguation matters to the efect and efcient in text classif-cation, information retrieval, machine translation, voice recognition and other naturallanguage processing system directly. Since the1990’s, word sense disambiguationbased on corpus has occupied the predominance with the popularity of corpus-basedapproach. And most of the work is to reach the disambiguation by supervised or unsu-pervised machine learning of corpus. Because of the heavy efort and time-consumingwork in hand-classifed training corpus under supervised learning, unsupervised ma-chine learning approach is the most popular way to start disambiguation task.NameEntityAmbiguous is a questionthat aname entitymay correspondmultiplereal entity, which we called entity concept. The name entity disambiguation and wordsense disambiguation has a lot in common. But they have their own difculties. Thetarget of name entity disambiguation is hard to get, the name variation and the nameambiguity problems. The task can be divided into two diferent ways, single languagesand multilingual. Although disambiguation in single language has been researchedfor a long time, the research on disambiguation in multilingual, especially based onwikipedia, is just starting. And it is much more complex than single language withunsatisfactory results. It is considered to be a nice research orientation. In this thesis, we acquire Chinese and English Wikipedia data backups from of-cial APIs. We set up local Wikipedia database by MySQL. In algorithm aspect, we usethree features to estimate disambiguating texts:text similarity, entity relational degreeand category relational degree. And we add English Wikipedia pages to solve the lackof Chinese pages.After the introduction of the methods, we perform the experiments with the Chi-nesepersonalnamedisambiguationtestcorpusprovidedbyThesecondCLP-SIGHANjoint conference on Chinese Language Processing(CLP-2012) and news corpus down-loaded from the Internet. The experiments achievenice results. The experiment resultsshow that the proposed methods are feasible and efective in named entity disambigua-tion.
Keywords/Search Tags:Word Sense Disambiguation, Name Entity, Entity Disam-biguation, Wikipedia
PDF Full Text Request
Related items