Font Size: a A A

Research On Join Extraction Of Resource Entities And Their Relations

Posted on:2021-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z M WeiFull Text:PDF
GTID:2518306560953589Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Resource naming entity recognition and relationship extraction in the professional field are important methods for extracting information from free texts related to resource descriptions.Based on entities and relationships,a resource library and resource knowledge map can be constructed,which can better promote the development of upper-level tasks in natural language processing.After researching domestic and foreign dynamics,it is found that currently,neural network models are usually used to solve the problems of named entity recognition and entity relationship extraction.Selecting an efficient text representation method can effectively improve the training results of the model.At present,most researches regard named entity recognition and entity relationship extraction as two independent tasks for modeling.There are few studies on joint extraction but they have practical significance and improvement space.Since entity recognition and relationship extraction are processing of the same data at different stages,there will be repeated data preprocessing,model reuse,and the same artificial features and error entities appearing in the entity recognition stage during the model building process will continue to be passed to the relationship extraction module,etc.Problems that affect the output of the final result.Concerning the current status of the problem,the contributions of this thesis are as follows:(1)In order to reduce the excessive reliance on artificial features and professional knowledge,this thesis builds a sequence labeling model based on Bi LSTM + CRF,uses Bi LSTM for deep coding,extracts text features,and combines CRF The output tag sequence completes the entity recognition,and simultaneously builds a Bi LSTM-based relationship extraction model.Then,in the construction of the joint extraction model,BERT is introduced to more effectively represent the text,and a new sequence labeling model IFT-Joint is used for the joint extraction of entity relationships.(2)A joint extraction method based on entity information and relationship information fusion labeling is proposed.This method mainly converts the joint extraction task into a sequence labeling problem.The newly defined labeling strategy fully mines the entity relationships in the text and eases the combination.The problem of overlapping relations in extraction can suppress the error propagation of the two stages in the pipeline method and improve the overall recognition performance.(3)The definition structure of chemical resource knowledge is defined.The chemical resource entity data set is initially constructed and an attribute dictionary is formed.From this,the extracted entities and relationships are generated into a resource knowledge set that is easy to store and manage.The method proposed in this thesis is tested on the entity data set in the chemical field.Under the same hardware and software environment,the method proposed in this thesis can improve the accuracy,recall rate,F1 value,and F1 value of joint extraction of resource entities and their relationships.Reaching 76.55%,the F1 value of the resource name entity recognition result can reach 92.11%,and the accuracy of joint extraction has increased by2.91%.Compared with other models,the model proposed in this thesis can stabilize when the training set reaches 40%.The experimental results show that the joint extraction model can realize the combination of two sub-modules to reduce the data processing time and the transmission of erroneous data.The experiments also show that the model proposed in this thesis has good stability.
Keywords/Search Tags:Entity recognition, Relation extraction, Joint extraction, BERT, Sequence labeling
PDF Full Text Request
Related items