The rapidly growing network data contains a large amount of information and knowledge,and natural language is one of the most important forms of network data.However,language is ambiguous,and there are polysemy and synonymous phenomena.The goal of the entity linking is to correctly link the ambiguous mentions in the document to the corresponding entities in the knowledge base,so as to help humans and computers understand the definite meaning of the text.This paper studies the complete process of entity linking,focusing on three research tasks:(1)Entity extraction based on boundary enhancement.The first subtask of entity linking is candidate entity generation.This paper mainly aims at the effective extraction of nested entities,and proposes a nested entity recognition method based on Star-Transformer.In the input layer,the method is based on word-granularity fusion,uses a graph model to model the dependency relationship to enhance the word vector representation,and at the same time enhances the boundary by strengthening the connection within the word;in the encoding layer,LSTM,Transformer and Star-Transformer are used as encoders,and are analyzed in detail through comparative experiments;in the decoder layer,boundary information is explicitly enhanced.To begin,the boundary classification is performed to establish a clear demarcation.This information is then used to generate a set of potential entities.Finally,the category classification process is applied to this set of candidates to determine their appropriate classification.The experimental results demonstrate that this method can effectively enhance the performance of nested entity recognition.(2)Subgraph-based Collaborative Entity Linking.Based on the concept of collaboration,the attention mechanism is used to construct the relevant mentions in the document to form a subgraph,and then the entity disambiguation at the document level is carried out based on the subgraph.When constructing a subgraph,the current mention and related entities are used as nodes.In order to perform simultaneous reasoning of semantics and knowledge base,the representation of context is introduced as a virtual node;at the encoding layer,the subgraph is iteratively updated for multiple messages with the help of GAT.The experimental results on public corpora demonstrate that the model proposed in this paper effectively enhances the performance of entity disambiguation.(3)Entity linking based on augmented graphs.The subgraph collaboration strategy in the previous chapter only constructs related entities into subgraphs,ignoring the specific relationship between entities.The goal of the enhanced graph is to improve the relation information of the subgraph by incorporating the existing knowledge(from the knowledge graph),that is to construct the relationship type feature for it.Experiment shows that the incorporated edge features can further improve the performance of entity disambiguation. |