Font Size: a A A

Research And Application Of End To End Entity Relation Extraction Algorith

Posted on:2022-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306740451944Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Entity relationship extraction is an important branch in the field of natural language processing.It is a text processing technology that extracts people-interested content from unstructured or semi-structured text and organizes it into structured text.It is the the most important tasks for knowledge acquisition in the construction of knowledge graphs.Aiming at the existence issue or the insufficiency of the existing research,this thesis starts to study from the two aspects of "model" and "data",and solves the problem that the current end-to-end entity relationship extraction cannot complete the extraction of overlapping triples,as well as the existing entity relationship extraction tasks.Missing labels and labeling scarcity of triples.In terms of models,this thesis proposes an end-to-end entity relationship extraction model that can solve the problem of overlapping triples.This model uses multi-information annotation and joint training mechanisms to complete the end-to-end entity relationship extraction while also Solve the problem of overlapping triples extraction.The model achieved F1 values of 80.5% and 76.6% on the SKE and NYT data sets,respectively,which exceeded all baseline entity relationship extraction systems.To solve the problem of missing labels for three tuples in labeled data,this thesis studies the effects of "positive sample reduction" and "negative sample mislabeling".In order to quantify the effects of "positive sample reduction" and "negative sample error" on model training,an adjustment loss function is introduced to calculate the F1-Score drop ratio of these two effects on the model.In order to alleviate the missing label problem,a training method based on negative sampling is presented,and its effectiveness is verified by experimentsIn view of the scarcity of labeled data,this thesis modifies the Tri-Training semisupervised algorithm in the three stages: unlabeled data preprocessing,model initialization,model iteration.In the experimental part,in order to verify the effectiveness of the semisupervised algorithm on the task of entity relationship extraction,we used the some subsampling ratios on the training data to simulate model training tasks of various orders of magnitude.Through the compare the modified Tri-Training algorithm with the supervised training algorithm and the original Tri-Training algorithm,it is proved that the modified TriTraining algorithm in this thesis can alleviate the scarcity of labeling data.Finally,the research results of this thesis are applied to the construction of knowledge graphs.Through a semi-supervised training method that combines a small amount of manually labeled data and a large amount of unlabeled data,the entity relationship extraction model is trained,and the model is used to extract triples on a large amount of unstructured data.In order to verify the effectiveness of the research results of this thesis,we constructed a high-quality test set from unstructured data through multiple rounds of manual annotation,and uses the test set to evaluate the extraction effect of the model.Experimental results show that the entity relationship extraction method in this thesis can solve the problems of "overlapping triples","missing triples of data" and "scarcity of labeled data".
Keywords/Search Tags:Entity and Relationship Extraction, End to End Extraction, Semi-supervised Learning, Neural Network, Knowledge Graph
PDF Full Text Request
Related items