Font Size: a A A

Research On Methods Of Entity Linking In Microblog Based On Greedy Forest

Posted on:2015-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:X Q ZouFull Text:PDF
GTID:2348330422490878Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Entity linking has received much more attention. The purpose of entity linkingis to link the mentions in the text to the corresponding entities in the knowledge base.Most work of the entity linking is aiming at long texts, such as BBS or blog. Mi-croblog as a new kind of social platform, however, entity linking in which will facemany problems. Quickly and accurately linking the mentions into the knowledgebase is a very important work. For scientific research, it can improve the accuracyof machine translation, the relevancy of Web search documents, click-through rateof search advertising and the accuracy of domain knowledge base construction. Forlinking named entities from microblog to Wikipedia knowledge base without ambi-guity, this paper is concretely divided into the following three main parts.The first part is named entity recognition in microblog. Considering Englishdoesn't need the special process of word segmentation, this paper will locate Twitteras the microblog. Methods of named entity recognition in long text are often basedon the rules and some statistical machine learning algorithms like Conditional Ran-dom Field. When these methods used in the named entity recognition in microblog,the effect is not significant. This paper uses Latent Dirichlet Allocation topic modelgenerating the prior distribution of the entity mention on the entity categories. Thenthrough the Bayes' rule, we obtain the probability that entity mentions belong to anamed entity category. Combining the predicting results of the Random Field Modelwith labeled Latent Dirichlet Allocation, the experiment results show that the ensem-ble model of named entity recognition in microblog can receive good effects.The second part is entity candidates' generation and their feature extraction.Generating entity candidates often use query expansion method based on Wikipedia.However the disadvantage of this method is that generating too many entity candi-dates will bring more ambiguity candidates. Using the Support Vector Machine (SVM)model to filter the entity candidates, will get higher coverage and a smaller numberof entity candidates. In terms of feature extraction, according to the characteristics ofmicroblog, this paper describes the features between entity candidates and entity men-tions from global and local aspects. General model is adopted to carry on to makecomprehensive analysis between the two categories of features.The third part is entity candidates' ranking, which is the most important part ofentity linking. Based on pairwise and listwise approach, this paper analyzes and com-pares the two kind of models. Without considering the ranking of non-targets, thetraditional learning to rank models don't obtain a better result. This paper utilizes Regularized Greedy Forest to solve this problem. The experiment results show thatthe modified gradient boosting decision tree can effectively improve the performanceof entity linking.
Keywords/Search Tags:entity candidates, topic model, global feature, Regularized Greedy Forest
PDF Full Text Request
Related items