Distant supervision obtains a large training corpus with relation labels by aligning the unlabeled corpus with the knowledge base.Distant supervised relation extraction identifies relation facts in the linguistic text on top of the training corpus obtained by distant supervision and presents them to the user in a structured form.Due to the presence of noisy data,distant supervised relation extraction still faces some challenges and it can be divided into two sub-tasks:relation noise reduction and relation classification.The latest pattern-based approaches use attention regularization to force the model to focus more on the relation patterns that can express relation facts that achieves good results for noise reduction and classification.However,the previous studies still have the following problems:(1)The previous pattern-based approaches only sample sentence sequences between entity pairs as fixed relation patterns,which cannot solve the problem of variable patterns and can seriously affect the performance of the model.(2)The latest relation classification studies only use sentence information,ignoring the information of specific entities and their location information.Based on the above problems,this thesis studies distant supervised relation extraction based on contrastive learning and entity information fusion,which is mainly as follows.(1)In this thesis,we propose a contrastive learning-based relation noise reduction algorithm CONNOR(CONtrastive learning based NOise Reduction)to deal with the problem of variable patterns,which proposes a novel contrastive learning loss function making the model learn a more reasonable representation of the original sentence and entity relation sentences in the same instance,while we propose a novel data augmentation strategy for the contrastive learning model.It constructs entity relation sentences using entity pairs and relation patterns to represent the semantics of specific relation types.Unlike previous pattern-based approaches,CONNOR can learn a unified semantic representation of the relation patterns corresponding to the same relation type by maximizing the consistency between the original sentence representation and the corresponding entity relation sentence representation.With a unified semantic representation,CONNOR can not only filter noisy data,but also correctly label instances using a relabeling strategy.The experimental results demonstrate the effectiveness of the relation noise reduction algorithm in this thesis,which shows a significant improvement over previous methods in terms of noise reduction effect.(2)Existing relation classification models often overlook entity-level information,resulting in suboptimal performance.We propose two methods for fusing entity information:in the first method,we insert entity delimiters before and after entities in the model input text,serving to indicate entity positions;in the second method,we fuse entity vectors with the[CLS]vector to obtain the final classification vector.Experiments show that both entity information fusion methods enhance the classification model’s performance.(3)This thesis also establishes a distant supervised relation extraction demonstration system,which allows users to train relation noise reduction models and relation classification models on browser,save and delete models,invoke the background models to judge noise instances and predict relation types,and display the results.During the model training process,users can view the training logs. |