| Relationship extraction is the basic task of natural language processing.Distant supervision method can automatically build data sets for relationship extraction tasks,alleviate the pressure and cost of manually building data sets,and establish the foundation for the implementation of automatic relationship extraction.However,its strong hypothesis data set construction method results in the problem of wrong annotation,introduces a large number of noisy data and leads to poor model performance.Therefore,in order to alleviate the problem of noise data in distant supervision,on the one hand,entity background description information is introduced from the existing knowledge base to provide richer input sentence features for the model.On the other hand,adversarial training is used to improve the robustness of the model when encountering adversarial samples.In this paper,the problem of a large amount of noise in distant supervision relation extraction training data set and the weak robustness of the model are studied.The main work of this paper is as follows:(1)Aiming at the noise problem of distant supervision relation extraction training data set,a distant supervision relation extraction model with target entity background description information is designed based on REDSand T basic model,which is REDSTDESC model.The model is composed of input layer,sentence encoder layer,sentence bag encoder layer and relation classification layer.The input layer constructs structured input text,and introduces the background description information corresponding to the target entity from Wikipedia and Wikidata knowledge base to the input layer to provide richer input features for the model;The sentence encoder layer encodes the instance of the input layer into a sentence representation understandable by the neural network based on the BERT pre-training model;The sentence bag encoder layer classifies the sentence representation output by the sentence encoder layer into different sentence bag according to the target entity,and uses the selective attention mechanism to reduce the internal noise of the sentence bag and construct the sentence bag representation;The last relationship classification layer classifies the relationship of sentence bag through Softmax classifier.The experimental results show that compared with the basic model REDSand T,the AUC index value of REDST-DESC model is increased by 2% on NYT-10-enhanced data set and 0.8% on GDS-enhanced data set.(2)Aiming at the problem of weak robustness of distant supervision relationship extraction model,based on REDST-DESC model,distant supervision relationship extraction models REDST-DFGM and REDST-DPGD with adversarial training are designed.The model consists of five parts: input layer,sentence encoder layer,sentence bag encoder layer,relationship classification layer and adversarial training layer.The first four layers are the same as REDST-DESC model.The model adversarial training layer is trained based on the adversarial training methods of FGM and PGD.Perturbation is added to the original input samples in the training process to construct adversarial samples for training,so as to improve the robustness and generalization ability of the model.The experimental results show that on the NYT-10-enhanced data set,the AUC values of REDST-DFGM and REDST-DPGD reach 0.433 and 0.449 respectively,which are increased by 0.4% and 2% respectively compared with the model REDST-DESC,and 2.4%and 4.0% respectively compared with the basic model REDSand T.Compared with the model DISTRE with the best performance in the control experiment,the AUC values are increased by 1.7% and 3.3% respectively。 it is verified that the overall performance of the improved model methods REDST-DFGM and REDST-DPGD designed in this paper is better than other control model methods. |