Font Size: a A A

Research On Chinese Entity Relation Extraction Based On Schemas And Pre-trained Language Models

Posted on:2022-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:R LiFull Text:PDF
GTID:2518306779996429Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Entity relation extraction is a subtask within the field of natural language extraction that plays an important role in downstream tasks such as knowledge graphs,search engines,and intelligent question answering.According to whether named entity recognition and relation extraction are entirely independent tasks,entity relation extraction can be divided into two methods: pipeline extraction and joint extraction.In recent years,the joint entity relationship extraction method based on a pre-trained language model has made good research progress,but there are still problems such as entity overlap,relationship overlap,and entity redundancy,and the information of the entity itself is not fully utilized in relation extraction.Especially in Chinese entity relationship extraction,due to the characteristics of language expression,the first step in entity relation extraction is word segmentation.In this process,entity boundary segmentation errors often occur;and due to the complexity of Chinese semantics,entity relation extraction research started relatively late,and related research is not as popular as English.This thesis conducts an in-depth analysis and research on the existing problems of joint entity relationship extraction and the practical dilemma of Chinese research and proposes a Chinese entity relation joint extraction method based on the schema and pre-trained language model SC-ERE(Schema-based Chinese Entity Relation Extraction model),and experiments were carried out on the Du IE,San Wen,Fin RE,and ACE2005 datasets to verify the effectiveness of the method proposed in this thesis.The main work of this thesis is as follows:(1)The pre-trained word vector is combined with the word vector by the method of word mixture vector,and the position vector containing the position information is added at the same time,to improve the accuracy of the Chinese word segmentation boundary segmentation and further improve the overall performance of the model.(2)The entity relation extraction framework of "extracting the head entity first,and then extracting the tail entity under a specific relationship type" is adopted,which allows the existence of multiple tail entities under the same relationship type of the head entity,which can solve the problems of entity overlap and relation overlap.(3)A method of enhancing the entity relation extraction method based on a pre-trained language model is proposed by using the relational model.Using the relational model to filter the candidate relationship types of the head entity and the entities with the relationship can enhance the model for specific relationships.And the attention of entities with relationships to solve the problem of entity redundancy.(4)A method of tail entity extraction using entity type information is proposed.The tail entity type encoding is combined with sentence encoding,head entity encoding,and relation type encoding,which can guide and constrain the identification of tail entities and improve triples.Extraction accuracy.
Keywords/Search Tags:entity relation extraction, pre-trained language model, relation schema, entity type
PDF Full Text Request
Related items