Faced with the ever-increasing mass of text data,relational triple extraction,as a core task in information extraction and a key step in building large-scale knowledge graphs,plays an important role in efficiently extracting valuable information and applying it to intelligent systems.However,the common and complex overlapping problem of relational triples in text,the inaccuracy semantic feature extraction of the encoder and the scarcity of labeled data will reduce the precision of relational triple extraction model to a certain extent.Therefore,in view of the current problems of relational triples,this thesis firstly mines and utilizes the interaction information between subjects and relations in overlapping relational triples,and explores and constructs a model that can automatically learn the relevance between them,so as to improve the precision of overlapping relational triple extraction.Secondly,this thesis introduces contrastive learning,and proposes a heuristic data augmentation method to generate positive samples that retain more semantic information of the original sentence for the model training,so as to enhance the semantic feature extraction ability of the encoder and improve the precision of relational triple extraction without adding additional labels.The main research content of this thesis includes the following two aspects:To solve the problem that there are different relevance between subjects and specific relations in overlapping triples,this thesis proposed the relational triple extraction based on the subject relation attention network.By introducing the subject relation attention module to learn the relevance between the subject and all relations automatically,the model can pays more attention to the relations with the correct objects,reducing the probability of extracting wrong objects from irrelevant relations and improving the precision of the overlapping relational triple extraction.Compared with baseline,the proposed method increases F1 scores by2.3%,2.6% and 1.2% on the three datasets,respectively.Meanwhile,the comparative experiment results of object extraction from the three datasets further confirmed that the introduction of the subject relation attention module can effectively reduce the number of the wrong objects in irrelevant relations.Aiming at the problem that the precision of relational triple extraction is reduced due to the inaccuracy of semantic feature extraction of encoder and the scarcity of data labels,this thesis proposed the relational triple extraction method based on heuristic contrastive learning.Firstly,the contrastive learning is introduced to generate positive sample pairs through data augmentation,and close their distance in the representation space,while keeping them away from other negative samples,so as to reshape the representation space and enhance semantic feature extraction ability without additional labels requirements.Secondly,this thesis proposes a data augmentation method based on the genetic algorithm,which solves the problem that the data augmentation based on random destroys the semantic information of the original sentence and generates the low-quality positive samples.This method can improve the feature extraction capability and the precision of the relational triple extraction by generating the positive samples with more semantic information of original sentences through heuristic search.Compared to baseline,the proposed method increased F1 scores by 1.0%,2.3% and1.6% on the three datasets,respectively.At the same time,compared with the data augmentation method based on the random,the experimental results also strongly prove that the proposed data augmentation method based on genetic algorithm has a gain effect on improving the precision of relational triple extraction. |