Font Size: a A A

Research On Context Aware Entity Linking

Posted on:2018-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:H L DaiFull Text:PDF
GTID:2348330515459761Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Extracting structured information from unstructured text data allows them to be efficiently utilized by computers,and is crucial to improve the performance of retrieval systems,question answering and machine reading.Entity linking plays a key role in this process,it aims to map the proper nouns(entity names)occurred in text to their referred entities in a knowledge base,through resolving the disambiguation caused by linguistic phenomenons such as alias and polysemy.The problem of finding the best candidate among multiple entity candidates that corresponds to one entity name is the central problem of entity linking research.This paper conducted a serious of research on this problem.Firstly,we propose a novel distributed document vector representation learning model.Along with words information that is embedded in document vectors with traditional methods,it also captures information about the entities mentioned in a same context and the co-occurrence of different entities,which are particularly beneficial for entity linking.This proposed model is hard to train directly,thus we studied a training method for it which works by randomly sampling training samples and is based on Hierarchical Softmax or Negative Sampling.This training method makes it possible to incorporate more information and also accelerates the training speed.Then,we built a model for the semantic similarity between the candidate entity and the current input document based on the document vectors learned.At last,the estimated semantic similarities and some other properties of the candidate entities themselves are combined to find the best candidate,then a complete entity linking system is built with this approach.Our entity linking system based on distributed representations does not require hand crafted features like traditional methods did.The assumption that different entities mentioned in a same context should be related is automatically exploited.Moreover,its training only needs a small amount of entity linking training data and takes a short time,which is a huge advantage compared with the deep neural networks based approaches recently proposed.The results of a series experiments conducted on the frequently used TAC KBP entity linking dataset verified the outstanding performance of our entity linking system,its accuracy can be more than 2 percentage higher than existing approaches.In the English language Entity Discovery and Linking task of 2016 TAC Knowledge Base Population competition conducted by NIST,the entity linking system based on this work ranked 1 st in the overall performance.Among all the 8 metrics,it ranked 1 st in 6 metrics and 2ed in 2 metrics.13 teams from different organizations participated this task,including CMU,IBM,USTC,etc.The developed entity linking system is also applied to several national projects such as the China knowledge centre for engineering sciences and technology project,offering support for the knowledge base population part and structured information extraction part.
Keywords/Search Tags:Entity Linking, Document Vector, Information Extraction, Distributed Representation
PDF Full Text Request
Related items