| In the context of the big data era,a large amount of unstructured data is generated every day,and how to extract valuable information from these data is an urgent problem to be solved.As an important subtask of information extraction,document-level relation extraction requires determining the relationship between two given entities from a long document to form a triple knowledge.The obtained knowledge can be used to build knowledge graphs,or provide knowledge support for intelligent question answering systems and information retrieval systems,and has wide application value.However,due to the existence of many reasoning phenomena in the document corpus,some potential relationships are difficult to be judged and depend on certain reasoning mechanisms to be extracted,which makes the extraction accuracy of these triples not high enough.To address the above problems,the thesis proposes two document-level relation extraction methods incorporating reasoning information,which can effectively enhance the extraction effect of potential triples in documents.The main research contents and innovations are summarized as follows:(1)To address the problem that potential triples in the document corpus are difficult to predict and rely on other simple triples information to assist reasoning,the thesis proposes a document-level relation extraction model based on pre-classification information.The model employs two classifications to mine the relationships embedded in simple and potential triples respectively.First,the documents are encoded using a pre-trained language model to obtain the mention representation as well as the entity representation,followed by the first classification of entity pairs using an improved adaptive threshold loss function,which can extract most simple triples.After that,the information of the same-entity extracted triples is aggregated based on the confidence to enhance the semantic representation of the entities and then the entity pairs are classified again.The second classification can effectively extract potential triples.In this thesis,sufficient experiments are conducted on the mainstream document-level relation extraction dataset Doc RED,and the results show that the proposed model in this paper has significantly improved over the baseline in terms of F1 scores,which validates the effectiveness of the model.(2)To address the problem of lack of modeling co-sentence reasoning and logical reasoning in document corpus,this thesis proposes a document-level relation extraction model based on path reasoning.The model constructs a document graph based on the association relationship between entity mentions and sentences,and the semantic dependencies between mentions and sentences can be mined by graph neural networks.After that,the document graph is transformed into an entity-sentence graph,and first-order reasoning paths and second-order reasoning paths are constructed based on the connection of nodes in the graph respectively.The first-order reasoning path models the co-sentence reasoning by extracting the semantic information relationship between the head and tail entities and the sentence in which they are located.The second-order reasoning path models logical reasoning and extracts potential associations between head and tail entities through intermediary entities.Compared with other methods with implicit reasoning or explicit reasoning,the model proposed in this thesis achieves better reasoning results,and the intra-sentence F1 scores and cross-sentence F1 scores are significantly better than other models. |