Font Size: a A A

Algorithmic Studies On Entity Recognition And Relation Extraction Based On Deep Learning

Posted on:2023-07-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q WanFull Text:PDF
GTID:1528307169476754Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Entity recognition and relation extraction,as two core technologies in the field of information extraction,aim to extract key information from unstructured sequence texts,which play important roles in intelligent systems such as recommendation systems,search engines,and chat robots.As the development of artificial intelligence enters the bottleneck period of cognitive intelligence transformation,building a large-scale knowledge graph and introducing human prior knowledge into machines has become the key to break through the bottleneck.The construction of large-scale knowledge graph needs to obtain key information from massive data and generate structured knowledge representation.Among various data,the key information acquisition of unstructured sequence text is the most complex.It needs to consider the semantics,context and syntax of the text at the same time.Traditional information extraction methods are limited by their modeling capabilities,making it difficult to perform unstructured sequence text reasoning and interpretation.In recent years,deep learning technology has been rapidly developed and widely used.Deep neural networks have gradually replaced traditional methods in textual information extraction due to their powerful learning,computing and representation capabilities.The study of intelligent algorithms for extracting key information from unstructured sequence texts has become an important topic in the field of deep learning.Information extraction algorithms based on deep neural networks have achieved certain results,but in some scenarios,there are still problems such as missing features,insufficient modeling,poor domain adaptability,and insufficient feature relationships.To overcome the above problems,the algorithm research of entity recognition and relation extraction based on deep learning technology is carried out.The major work and innovation of this thesis are listed as follows:(1)A Chinese medical entity recognition algorithm based on dictionary enhanced attention network and multi-scale loss function is proposed.The algorithm designs a dictionary-enhanced attention network based on self-attention mechanism.By introducing external dictionary information,the joint modeling of character-level features and word-level features in the text is completed without generating word segmentation errors.In training stage,a co-training method based on multi-scale loss function is designed,and the model parameters are optimized using both sequence labeling error and span classification error.In order to improve the adaptability of the model to Chinese medical texts,the transfer training of the general pre-trained language model ELMo is realized by constructing a medical corpus,which effectively alleviates the OOV(Out-ofVocabulary)phenomenon while improving the text embedding effect.The experimental results show that,compared with various entity recognition methods,the proposed algorithm has stronger embedding effect and higher entity recognition accuracy for Chinese medical texts.(2)A document-level relation extraction algorithm based on hierarchical dependency tree and bridge path is proposed.The algorithm designs a fine-grained document reconstruction method,which extracts hierarchical dependency features from the chain structure to build a tree graph model for the document.The graph contains four levels of node information and three types of edge information.The model uses graph convolution operations to perform fair and deep fine-grained modeling of each node,obtaining a structural feature representation of the document graph.In the relation extraction stage,the thirdparty entities that co-occur with the head and tail entities in the entity pair are defined as relation bridge entities.The relation bridge entity provides an implicit bridge path for entity pairs in the document.The algorithm models bridge path features through long short-term memory network and self-attention mechanism to enhance the performance of document-level relation extraction.The experimental results show that the proposed algorithm exhibits competitive relation extraction performance on the DocRED dataset compared with multiple document-level relation extraction algorithms.The Ign-F1 score obtained by the algorithm has a clear advantage in extracting relational facts that did not appear in the training set.(3)A joint entity relation extraction algorithm based on regional relational hypergraph network is proposed.The algorithm proposes the concept of text region supernodes.By designing a modeling method in which graph convolutional network and long short-term memory network cooperate to generate feature representations for regional supernodes,local regional features can be fully captured while preserving the global linear features of text.The algorithm builds a relational hypergraph network based on regional supernodes,in which a sequence-enhanced graph module is designed to perform an attention mechanism on the edges between supernodes.Each supernode aggregates the key information of the neighborhood to update the hypergraph network.The experimental results show that the proposed algorithm outperforms other models in F1 scores on three public datasets,and has good nested entity recognition performance.(4)A joint entity-relation extraction algorithm based on multi-modal attention network is proposed.The algorithm deeply mines the potential multi-modal data features in the sequence text under the joint extraction task,defines them macroscopically and classifies them as entity features,context features and label features.In order to obtain high-quality contextual features,a contextual modeling method based on cloze mechanism is proposed.This method imitates the human’s mind of problem-solving,and makes the model’s attention focus on the context information by performing the mask operation on the central word of the text,so as to better complete the context modeling task.In the entity recognition and relation extraction stages,a modal-enhanced attention module with two modes is proposed.This module can capture the context dependencies within singlemodal data and the fine-grained interaction features between multi-modal data while preserving word order features.The experimental results show that the proposed algorithm achieves the state-of-the-art on SciERC dataset and ADE dataset,and improves the F1 socre for relation extraction on CoNLL04 dataset.
Keywords/Search Tags:Named entity recognition, Relation extraction, Joint entity-relation extraction, Deep learning, Deep neural network, Graph network, Self-attention
PDF Full Text Request
Related items