Font Size: a A A

BTM Topic Modeling Approach To Named Entity Linking

Posted on:2018-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2348330512977223Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous expansion of network resources,the increase in information makes it more and more difficult for people to obtain valuable information.The short text like tweets,micro-blog and others becomes more and more popular,which makes it even harder to get interesting content.The ambiguity of expanding the named entity entries has become the important and difficult point of research.The named entity linking technology is an important method to solve the problem.Named entity link is the process of linking a given named entity in a document to an unambiguous entity in a knowledge base,including the merging of synonymous entities,disambiguation of ambiguous entities,etc.The technology can improve the information filtering capabilities of the online recommendation system,Internet search engine and other practical applications.This paper presents a named entity linking method based on BTM topic model due to the characteristics of short text content and nonstandard language.Firstly,the paper construct the named entity knowledge base,construct synonym list and ambiguous words by using the offline version of Wikipedia.In this paper,we use rule-based and statistical methods to identify named entities in short text.Due to diversity of named entity appearing in the short text,we standardize it according to the synonym list in the knowledge base,obtain the candidate named entity set according to the word ambiguity and prune and reduce the candidate entity set size,improve the efficiency of candidate entity ranking according to the named entity context properties.In this paper,considering the frequency of word co-occurrence and the frequency of single occurrence,we improve the MPM co word measure which only considers the co-occurrence frequency without considering the frequency of individual words so that it can calculate word co-occurrence coefficient.Secondly,under the assumption that the named entity and word in the same document are in the similar topic distribution,we carry on modeling and document entity disambiguation in the semantic layer and propose a method named entity BTM link based on topic model.The method uses BTM model based on word co-occurrence degree coefficient of named entity semantic modeling and uses the method to solve the parameters of Gibbs sampling,which makes the model more simple and accurate and provides a theoretical basis for the subsequent data processing.Finally,according to the cosine similarity between the location vector of the subject space of the named entity and the cosine of the candidate entity,the named entity in the given text is linked to an ambiguous named entity in the knowledge base.
Keywords/Search Tags:Named Entity Linking, Topic Models, Biterm Topic Model, Wikipedia, Word Co-occurrence Measure
PDF Full Text Request
Related items