Font Size: a A A

Research And Implementation Of Entity And Relationship Extraction Algorithm Based On Joint Setting

Posted on:2022-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:G Q ZhangFull Text:PDF
GTID:2518306332467804Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Knowledge graph is structured semantic knowledge base and the cornerstone of machines to implement cognitive intelligence.It describes concepts and their relationships in knowledge in form of graph structure,which basic unit is expressed as "entity-relation-entity" triplet.These triplets are usually hidden in massive,unstructured text.Faced with massive amounts of data,named entity recognition and relation extraction models can extract structured data that meets actual requirements from unstructured text,which can greatly reduce human expenditure.Therefore,how to extract knowledge triples from the text accurately and comprehensively is a problem of great research value.Named entity recognition and relation extraction are two different information extraction tasks.The former can extract entity boundary and category from unstructured text,while the latter identify semantic relationship in different entity pairs.From features of the two tasks,it is easy to find that there is a strong correlation between them.Therefore,researchers try to combine the two tasks and make full use of their relevance in order to obtain better work efficiency and performance.This essay mainly focuses on the subject of extracting entity and relationship algorithm on joint setting,and combines the problems exposed in current hotspot technology in the field of natural language processing.The main research contents are as follows:First,we surveyed and reproduced the current hotspot technology with excellent performance and wide coverage based on span level,and discussed the shortcomings of the existing works:Although such models can take into account the impact of nested entities on the relation extraction task and circumvent the disadvantages of traditional sequence labeling models,lacks the supplement of syntactic features,which leads to a problem that the prediction of a relation between two entities is related based on corresponding entity type combination,but in fact they are not related in the sentence.In addition,some researchers have found that in the multi-head attention mechanism,some attention heads tend to focus on similar content and the power of some heads are not fully exploited.This essay analyzes the above observations and discusses their solutions;Second,based on the above observations,we propose an algorithm based on pre-trained BERT,which combines syntax-informed and local context attention mechanism.On the basis of performing named entity recognition task on span level,dependency tree of the sentence is pruned according to different entity positions,and at the same time,a part of the attention heads in the multi-head attention model pay attention to its weight,and the whole sentence is modeled so as to implement the fusion of syntactic and semantic features while making full use of the attention heads.At the same time,the influence of different pruning strategies on the model is discussed.In addition,a local attention mechanism is applied on the entity pairs and the context between them to implement in-depth learning of contextual features.The model has been tested on the public dataset Conll04 and SciERC.Compared with the current excellent baseline model,the F-1 metric has been improved by 2.4%and 3.3%respectively,and our model has better extraction performance;Third,on the basis of the works mentioned above and the characteristics of actual application scenarios,a prototype information extraction system based on movie news was developed.The system can automatically collect film and television news from the public network regularly and incrementally,extract information from the collected news corpus,and support to display the knowledge in form of graph,which shows a preliminary solution for extracting knowledge triples from unstructured text on Internet.
Keywords/Search Tags:named entity recognition, relation extraction, joint model, dependency tree, multi-head attention
PDF Full Text Request
Related items