Font Size: a A A

Design And Implementation Of Entity Link System In Specific Domain

Posted on:2019-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:G P ZhangFull Text:PDF
GTID:2428330566998539Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The main task of an entity linking is to link the entity that appears in the text to a standard entity in a knowledge base.Specifically,because the entities in the text may exist in many forms,we can not found the standard form in knowledge base rely on entity extraction techniques alone.Therefore,to carry out the research on the entity linking technology has become an urgent matter.At present,there are many kinds of entity linking methods,which can be divided into unsupervised entity linking methods that rely on dictionaries or similarities and supervised entity linking methods.When the data contains only the entity itself,there is no other redundant information,the problem is transformed into a single entity link,the traditional method is to use the matching or text similarity to link;and when the data in a complex non-standard form,this time the need to combine additional Information,using machine learning methods to link.In view of the small scale of the data base of medical field and the single non-standard form of entity,the task of entity linking is placed in the framework of retrieval.The construction of multi-field index items can be used to normalize the entities and thus complete the entity linking work;While the field of film and television data from the network,non-standard form of complex,and a huge knowledge base.For such a complex entity,the entity linking task is divided into two steps.First,a multi-level sieves mode is used to select the candidate entity set of the entities to be linked,and the unrelated entities in the knowledge base are filtered out,and then use the algorithm based on the convolutional neural network to sort the candidates,select the most relevant results as the final link results.In order to validate the effectiveness of the proposed methods,we selected datasets from urology clinical surgery in a hospital,datasets from clinical in a top three hospitals in China,and datasets from CCKS 2016(Entity Discovery and Links in Specific Domains).Experiments based on the multi-field index retrieval method proposed in this paper were performed on an surgical data set of a hospital and an outpatient dataset of a top three hospital.The physical link methods achieved an accuracy of 66.2% and 91.0% respectively.The standardization results on clinical datasets are as high as 67.6%,which is obviously higher than the traditional method based on edit distance.In the CCKS dataset,the multi-level sieves with convolutional neural network model used in this paper is also superior to the traditional machine learning method,achieving an accuracy of 73.6%.
Keywords/Search Tags:limited domain, entity linking, retrieval, sieve mode, deep learning, learning to rank
PDF Full Text Request
Related items