Research On Methods Of Entity Linking In Microblog Based On Greedy Forest

Posted on:2015-12-19

Degree:Master

Type:Thesis

Country:China

Candidate:X Q Zou

Full Text:PDF

GTID:2348330422490878

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Entity linking has received much more attention. The purpose of entity linkingis to link the mentions in the text to the corresponding entities in the knowledge base.Most work of the entity linking is aiming at long texts, such as BBS or blog. Mi-croblog as a new kind of social platform, however, entity linking in which will facemany problems. Quickly and accurately linking the mentions into the knowledgebase is a very important work. For scientific research, it can improve the accuracyof machine translation, the relevancy of Web search documents, click-through rateof search advertising and the accuracy of domain knowledge base construction. Forlinking named entities from microblog to Wikipedia knowledge base without ambi-guity, this paper is concretely divided into the following three main parts.The first part is named entity recognition in microblog. Considering Englishdoesn't need the special process of word segmentation, this paper will locate Twitteras the microblog. Methods of named entity recognition in long text are often basedon the rules and some statistical machine learning algorithms like Conditional Ran-dom Field. When these methods used in the named entity recognition in microblog,the effect is not significant. This paper uses Latent Dirichlet Allocation topic modelgenerating the prior distribution of the entity mention on the entity categories. Thenthrough the Bayes' rule, we obtain the probability that entity mentions belong to anamed entity category. Combining the predicting results of the Random Field Modelwith labeled Latent Dirichlet Allocation, the experiment results show that the ensem-ble model of named entity recognition in microblog can receive good effects.The second part is entity candidates' generation and their feature extraction.Generating entity candidates often use query expansion method based on Wikipedia.However the disadvantage of this method is that generating too many entity candi-dates will bring more ambiguity candidates. Using the Support Vector Machine (SVM)model to filter the entity candidates, will get higher coverage and a smaller numberof entity candidates. In terms of feature extraction, according to the characteristics ofmicroblog, this paper describes the features between entity candidates and entity men-tions from global and local aspects. General model is adopted to carry on to makecomprehensive analysis between the two categories of features.The third part is entity candidates' ranking, which is the most important part ofentity linking. Based on pairwise and listwise approach, this paper analyzes and com-pares the two kind of models. Without considering the ranking of non-targets, thetraditional learning to rank models don't obtain a better result. This paper utilizes Regularized Greedy Forest to solve this problem. The experiment results show thatthe modified gradient boosting decision tree can effectively improve the performanceof entity linking.

Keywords/Search Tags:

entity candidates, topic model, global feature, Regularized Greedy Forest

PDF Full Text Request

Related items

1	A Parallel Regularized Greedy Forest Implementation Based On Message Passing Interface
2	Research For Algorithm Of Chinese Entity Linking Technology Based On Topic Relation Graph
3	Theory And Key Techniques Of Entity Retrieval
4	BTM Topic Modeling Approach To Named Entity Linking
5	A Method Of Tracing The Topic Of Microblogs Based On Random Forest
6	Research And Implemention Of Name Entity Disambiguation
7	Forest Pests And Disease Entity Recognition Based On Initial Clustering
8	Conditional Random Fields Based English Name Entity Recognition
9	Saliency Detection Based On Manifold Regularized SVM Model
10	Research On Stock Market Hotspot Concept Mining Based On Topic Model And Entity Recognition