Font Size: a A A

Research On Chinese Microblogs Oriented Entity Linking Method

Posted on:2014-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:S S GuanFull Text:PDF
GTID:2298330422990866Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
It’s becoming more and more difficult to get valuable information with theexpansion of web resource and the growing of information. At the same time, withthe popularity of microblogging, people can’t gain more knowledge from theseshort messages. To solve the problem, the team develops a system that namedknowledge expansion and recommend. This system aimed at getting more valuableknowledge of the entries which users may be interested in. The ambiguities of theentries that will be expanding become the choke point of the system. The entitylinking is to solve this problem and it makes the program automatically determinean entity that appears in the microblog context should link to the referent entities ofthe real world. The task of the subject mainly focused on the following aspects:To obtain sufficient microblogs, this paper develops a web crawler ofmicroblogging at first. Compared to the API methods, the web crawler has greatlyimproved efficiency, while access to a large corpus of microblogs. And then,pre-processing task was done on the corpus.The acquirer of the candidate entities is the key point of Entity Linking. Foreach name mention in microblogs, many kinds of ways were raised to get candidateentities, and difference weights were given to difference entities so that can enhancethe precision. Traditional methods of acquisition candidate information primarilycome from Wikipedia or Baidu Encyclopedia. When Wikipedia and BaiduEncyclopedia don’t contain the entry, then call a meta-search to integrate theinformation on the web for acquire completeness information. For the featuresparseness of microblogs corpus, first use the user profile information, user tags andthe recent microblogs to expand the short text. And then to obtain the search resultsof Google, Baidu, Bing to expansion for keywords which were extracted frommicroblogs.Implemented algorithms based on the multi-channel candidate entities anddomain-based thesaurus. Compared with other methods, the algorithms display agood performance of precision on NLP&CC2013meeting dataset. Based on SinaWeibo Open Platform, this subject developed an application system of knowledgediscovery and recommendation. The algorithms of this paper can achieve the expectresult on the system.
Keywords/Search Tags:microblogging, entity linking, network information integration, semantic expansion
PDF Full Text Request
Related items