Font Size: a A A

Research On Key Technologies Of Named Entity Recognition And Linking Based On Representation Learning

Posted on:2022-01-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:B J JiaFull Text:PDF
GTID:1488306326979439Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,a large amount of unstructured text data will be pro-duced every day.Named entity recognition and linking can identify entities from text and correspond to things that exist objectively in the real world,help-ing computers to correctly understand semantics.There are multiple-gained entries in natural languages from fine-grained words,phrases and sentences to coarse-grained documents.Representation learning can extract semantic fea-tures of different granularity to improve the effect of entity recognition and linking.Due to the features of text data,which include its various sources,large scales and non-standard expressions,entity recognition and linking face the following challenges:(1)How to improve the word representation in an entity recognition process;(2)How to resolve the reference context conflicts in the same document when entity linking;(3)How to integrate multi-granularity information to represent mentions and candidate entities;(4)How to make full use of the relationship between candidate entities to improve linking effect.In view of the challenges above,this paper puts forward the corresponding solutions from four aspects:entity recognition based on enhanced word rep-resentation,entity linking based on interactive sentence representation,entity linking based on hierarchical semantic representation of documents and entity linking based on graph representation.The main contributions of this paper are as follows:(1)The first one is about entity recognition based on enhanced word rep-resentation.In view of the existing Chinese entity recognition methods only focuses on modern texts,the influence of word segmentations and dictionaries is larger.Wrong word segmentations,rare words and unknown words will re-duce recognition effect.This paper puts forward the Chinese entity recognition algorithm based on enhanced word representation(ECEM).Considering that Chinese characters can represent both forms and meanings,this paper proposes to combine the morphological features contained in the structure of Chinese characters with contextual semantic information to explore how to improve the representation of Chinese character vectors.The context semantics make up for the missing useful sequence information in the stroke vectors.The experiments based on ancient Chinese and modern resume data show that enhanced word representation can improve the effectiveness of entity recognition.(2)The second one is about entity linking based on interactive sentence representation.To solve the problem of context conflicts caused by incom-plete knowledge base and multiple occurrence of the same mention in a docu-ment,a new entity linking algorithm based on interactive sentence representa-tion(ELSR)is proposed.The Siamese network are used to reduce the differ-ences of the input and the representation of sentence pairs.The soft attention is used to align sentences and screen the key semantic features that are useful for entity linking.The differences and similarities between sentence pairs are in-tegrated into the sentence interaction representation model.The final sentence representation has deeper semantic features.The experimental results show that the entity linking algorithm based on interactive sentence representation can achieve better linking results than the benchmark algorithm in the case of fewer features.(3)The third one covers entity linking based on hierarchical semantic representation of documents.To solve the problem that existing entity link-ing methods cannot extract key semantic features from multi-granularity in-formation to represent mentions and candidate entities,an entity linking algo-rithm based on hierarchical document semantic representation(HSSMGF)is proposed.Through multi-level attention network,the semantic gap between different features can be reduced by screening,fusion and joint reasoning of multi-source and multi-level information.The supervised learning method is used to reduce the candidate entity set and filter the noise candidates,so as to improve the execution efficiency of the model while ensuring the recall rate of candidate entity.A global semantic feature based on the implementation of unambiguous candidate entities is predicted.When combined with local fea-tures,the final score of candidate entities is predicted.The experimental results show that this algorithm can effectively improve the linking effect by capturing semantic features from multiple perspectives and at different levels.(4)The fourth one includes entity linking based on graph representation.In order to solve the problem that the sorting result of candidate entities is not unique when the entity linking algorithm based on graph has a large number of disconnected isolated nodes,an entity linking algorithm based on LeaderRank(LEPC)was proposed.This algorithm is firstly adopted to mediate the proba-bility distribution among nodes by adding global nodes.The global semantic coherence is used to modify the ranking result.To solve the problem of not making full use of neighbor node information and too much noise in the graph,the entity linking algorithm based on graph convolution and the context seman-tic association(GBEL)is proposed.The semantic vector of candidate entity is obtained by continuously iterating and aggregating the information of neighbor nodes.The context-association model is designed to realize the deep seman-tic measurement between mentions and candidate entities.The experimental results show that the proposed algorithm can fully mine the topology informa-tion of the graph and improve the effect of entity linking compared with the baselines.On the basis of representation learning,this paper studies the key tech-nical problems in entity recognition and linking.Through the experimental verification,the proposed methods and models have achieved good results and improved the accuracy of entity acquisition.
Keywords/Search Tags:Named Entity Recognition, Entity Linking, Enhanced Word Representation, Interactive Sentence Representation, Hierarchical Semantic Representation of Documents
PDF Full Text Request
Related items