Font Size: a A A

Research And Application Of Document-Level Entity Relation Extraction Algorithm

Posted on:2024-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2568307115477434Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Document-level entity relationship extraction is to identify and extract effective entity relationship pairs from text articles composed of multiple sentences or paragraphs.It is an important task in natural language processing,which can be widely applied in fields such as information extraction,knowledge graph construction,and intelligent question answering.Recently,with the continuous development of natural language processing algorithms,research on document-level entity relationship extraction has made significant progress.However,there are problems such as the industry fields involved in the data set need to be expanded urgently,the low accuracy of the entity relationship extraction algorithm,and the existing entity relationship extraction system lacks stability and reliability.This thesis aims to promote the research and development in the field of document-level entity relationship extraction,realize the extraction of structured entity relationship pairs from unstructured text,and provide data support for other application scenarios.Specifically,this thesis explores the datasets,algorithms and systems for document-level entity relationship extraction.This thesis focuses on how to extract entity relationship pairs from document-level texts of multilingual long sentences,and realize the application of document-level entity relationship extraction systems.To this end,we make the following contributions:First,this thesis creates a dataset for extracting entity relationships at the document-level in the tourism industry.By crawling texts from well-known travel websites such as Ctrip,Meituan,and Fliggy,we obtain a large amount of long text information about scenic spot introductions,scenic spot evaluations,gameplay guides,scenic spot surroundings,and relevant policies.Through technical screening and manual annotation,this thesis selects 1132 articles,including 4940 valid entity relationship pairs.We build a document-level entity relationship extraction dataset for the tourism industry,which expands the scale of the dataset that can be used for document-level entity relationship extraction,which can be directly used for knowledge graph construction,and can also be used to fine-tune the entity relationship extraction model to achieve better relationship extraction effects in the tourism industry.Then,this thesis proposes the Relation Extraction Model for Mention and Entity Joint Reasoning(MAERE).It utilizes deep learning algorithms such as CNN,LSTM,and self-attention mechanisms to predict at the mention level and entity level separately.Additionally,it performs joint reasoning at the mention level using two different scopes,global and local.In order to further improve the prediction effect of the model,this thesis introduces the BERT pre-training model as an encoder to encode article information,and trains,verifies and tests it in Doc RED,Hac RED and tourism datasets.The results show that the prediction effect of the MAERE model is better than other models,and it can be used as an algorithm model of the entity relationship prediction system to provide decision-making information for the document-level entity relationship prediction system.Finally,the MAERE entity relationship extraction system was developed in this thesis,utilizing a microservices architecture,and at the same time uses automatic expansion and contraction configuration,which solves the problems of slow speed and long time-consuming processing of high-concurrency natural language processing tasks in the existing system.At the same time,the extraction process is visible to the user,and the results of each stage are fed back to the user interface,so that users can understand the relationship extraction process in detail.The development and deployment of the MAERE system has formed a complete entity relationship prediction system,which can meet the actual needs of scientific research and production environments,and provides a solution for system research and application in this field.
Keywords/Search Tags:Entity relationship extraction, Pre-training model, Natural language processing, Deep learning
PDF Full Text Request
Related items