Font Size: a A A

Research And Implementation Of English Entity Linking System

Posted on:2017-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:M L LiFull Text:PDF
GTID:2348330518995372Subject:Computer technology
Abstract/Summary:
In recent years,with the development of the information age,the number of text data of a lot of fields(such as news,social media,companies,etc.)in the Internet is growing,and these text data often contains valuable information.So,how to mine valuable information from these data has become particularly important.In the field of natural language processing,a study of the text analysis which names entity linking has caused more and more attention from researchers.Entity linking is the process of linking name mentions in text to their referent entities in a knowledge base.The main problems facing the entity linking is as follows:there are ambiguity and polysemy phenomenon in the large amount of Internet unstructured text.The ambiguity phenomenon means that an entity can be represented with more than one name mention.The polysemy phenomenon means that a name mention can represent multiple entities;in the candidate entities sorting process of entity linking,the previous works did not consider the topic information of the mention and its corresponding background text.By analyzing the main problems facing the entity linking and the shortcomings of the previous works,in this paper,we propose an approach based on topic-sensitive random walk with restart.Firstly,the content information of mentions are used to expand mentions and search the candidate entities in Wikipedia knowledge base for mentions.Secondly,graph can be constructed in accordance with the intermediate result in the pre step.Finally,the topic-sensitive random walk with restart model is used to rank the candidate entities and choose the TOP1 as the linked entity.Experimental results show that this approach on KBP2014 data set get F score 0.623 which is higher than every other system’s mentioned in this paper and on KBP2015 data set get F score 0.661 which rank the first place in all the participate teams.The proposed approach can improve the entity linking system’s performance.The main contributions of this paper are as follows:1.This paper proposes a candidate generate method based on an entity diversity word table.Using this table can be better to adapt to the entity ambiguity phenomenon.Integrating it into the candidate generate module can improve the overall performance of the entity linking system.2.This paper proposes a text-based technology for graph construction and a semantic relevancy calculation method between mentions and entities,and between entities.The graph in this entity linking system can adapt the polysemy phenomenon and contain more semantic information due to the semantic relevancy calculation method between entities.3.This paper proposes an approach based on topic-sensitive random walk with restart to improve the overall performance of the entity linking system.
Keywords/Search Tags:entity linking, random walk, Wikipedia
Related items