Font Size: a A A

Research On Joint Extraction Of Entities And Relations For Complex Text Structure

Posted on:2022-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:W C NiuFull Text:PDF
GTID:2518306779996539Subject:Internet Technology
Abstract/Summary:PDF Full Text Request
As the most important underlying task in the field of natural language processing,information extraction technology aims to extract high-density knowledge elements from unstructured text.The two most important sub-tasks are entity recognition and relation extraction.Knowledge graphs have powerful semantic reasoning capabilities,and building high-quality industrial knowledge graphs can promote the transformation of industry informatization.However,unstructured data on the Internet usually contains complex text structures,and traditional information extraction algorithms cannot cope with such complex semantic information.By improving the existing algorithm,this thesis focuses on how to effectively deal with overlapping triples.The main research contents are as follows:(1)Aiming at the single-entity overlapping triplet extraction problem in complex text structures,this thesis proposes a graph convolutional neural network joint extraction model(Joint-GNAA)with node-aware attention mechanism.The model captures multi-granular word representations of each word through a contextual feature extraction layer,including contextual embeddings,characters,and part-of-speech embeddings.To extract the regional feature representation of words,the dependency tree generated by the syntactic tool is used as the adjacency matrix input of the first-stage GCN,and the convolution operation is performed on the graph structure.The word association information under different relation spaces is extracted through relation-aware attention mechanism,and word association matrix is generated for each relation.The dependency information of all words is aggregated through the GCN in the second stage,thereby establishing the interaction between triples.Finally,the GCN outputs from the two stages are stitched together for relation and entity prediction.(2)Aiming at the problem of double-entity overlapping triplet extraction in complex text structures,this thesis proposes a joint extraction model(Joint-RGA)with relation-oriented attention mechanism.The model first captures the sequence features and regional features of sentences through a contextual feature extraction layer.Then,the original sentence is constructed into a new sentence representation according to different relations through the relation-oriented attention mechanism and relation gating mechanism.Among them,the relationship-oriented attention mechanism is used to calculate the weight coefficient of each word in the relationship space.The relationship gating mechanism is used to filter out useless information and retain the information that is helpful for entity labeling under the current relationship.Finally,the hidden state information between words in each relation space is captured through the multi-head attention mechanism of the feature classification layer and the Bi GRU network,and the output of the model is mapped to the entity label using CRF and normalization processing.Entity tagging of words.In this thesis,a joint extraction experiment is carried out on the open source English data set to verify the triple extraction effect of the proposed model under complex text structure.Effect.In addition,this thesis verifies the application effect of Joint-GNAA and Joint-RGA on the marine text dataset,and constructs a marine industry knowledge graph based on a large-scale unstructured corpus.
Keywords/Search Tags:Knowledge Graph, Joint Extraction of Entities and Relations, Attention Mechanism, Graph Convolution Neural Network, Overlapping Triplet
PDF Full Text Request
Related items