Font Size: a A A

Research On The Distance Supervise Relation Extraction Based On Deep Learning

Posted on:2022-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2518306563975049Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the arrival of the big data era,how to store and utilize massive amounts of unstructured data is the major challenge in the field of information extraction.Relation extraction is one of the most important tasks in information extraction.It aims to extract the relation between two or more entities marked in the text to form a triplet to support downstream tasks such as knowledge graphs,retrieval,and question and answer.Because the volume of traditional supervised datasets cannot support the requirements of deep learning,the distance supervised approach has been proposed,which can label large-scale data automatically.However,there are many vital issues that need to be solved urgently under this task.Firstly,the noise problem is caused by the strong hypothesis in the construction of the data set.Secondly,the language text has a large number of redundant words and sentences that have nothing to do with the target relation.Thirdly,insufficient information in the bag caused by the sparse data in the multi-instance bag and the influence of noise.For the above three issues,the thesis propose the following solutions:(1)The thesis adopts a relationship extraction model that combines semantic features and structural features.First,the thesis use BERT to extract the semantic features,and use GCN to capture the structural features of the dependency tree in the sentence,and connect them as the complete representation of the sentence,so that the model can fully capture the feature information inside the sentence.Then a sentence-level attention mechanism is used to select valid instances,which can reduce the influence of noisy data.(2)Due to the diversity of language expressions,there is a lot of redundant information in the sentence.This thesis designs a soft pruning strategy using the multihead self-attention mechanism,which can learn how to selectively attend to the relevant sub-structures useful for the relation extraction task automatically.The strategy can reduce the influence of irrelevant noise words in the sentence on the model,and enhance the robustness of the model.(3)This thesis designs a bag relation-entity pair graph model.The relationship between two entities in a bag is regarded as a graph node.If the same entity exists between two bags,it is considered there is a connection between the two nodes.Combination of the above package level feature vector,use GCN to capture the compartment,enables the characterization of each packet to fuse the information from the related bags.For some packets with only one sentence,it is possible to capture more effective information and reduce the noise between packets without relying on external information.The thesis compare the proposed model with some baseline models,the experimental proves that the model has the best performance,in which the AUC value reaches a higher value of 0.472.And some ablation experiments are designed to prove the fusion of semantic and structural features can enhance the model's ability to express sentences;the multi-head self-attention mechanism can reduce the weight of redundant information in sentences,allowing the model to pay more attention to the relationship related to the target information;the bag relation-entity pair graph convolution model can effectively use the information between packages and alleviate the problem of data imbalance.
Keywords/Search Tags:Natural Language Processing, Deep Learning, Relation Extraction, Distance Supervise, Graph Convolutional Neural network
PDF Full Text Request
Related items