As an important foundational industry in China,the chemical industry drives rapid economic development.However,due to the unstable physicochemical properties of hazardous chemicals,there are significant risks in their production,transportation,and storage processes.Accidents involving hazardous chemicals can pose a major threat to people’s lives and health,and can present significant challenges to social harmony and stability.In order to implement safe production,the emergency management department has proposed "intelligent accident information feedback" and "establishment of a knowledge graph for relevant emergency resources" for hazardous chemical safety production.Therefore,named entity recognition(NER)in the field of hazardous chemical accidents has gradually become a research hotspot.This project focuses on hazardous chemical accident data and applies natural language processing techniques to explore the construction of a causal graph for hazardous chemical accidents,providing robust data support and scientific processing methods for the safe production,transportation,storage,and emergency handling of hazardous chemicals.The main work of this project includes the following aspects:Integration of rule templates and Global Pointer for hazardous chemical accident entity recognition.Based on the characteristics of hazardous chemical accident entities,the hazardous chemical accident data is divided into two categories: one category has obvious structural features and relatively fixed formats(such as date,time,etc.);the other category lacks obvious structural features,with diverse expression forms and overlapping and nesting of multiple entities(such as organization,accident causes,etc.).This thesis proposes a method that integrates rule templates and Global Pointer for hazardous chemical accident entity recognition.For the first category of entities,this thesis abstracts their structural features and designs rule matching templates for recognition.For the second category of entities,this thesis uses a model based on Global Pointer for recognition.Experimental results on real-world datasets show that the proposed model significantly outperforms current popular entity recognition methods in the field of hazardous chemical accidents.Integration of sample uncertainty and diversity for active recognition of hazardous chemical accident entities.Considering the lack of annotated corpora in the field of hazardous chemicals,this thesis proposes a method that integrates sample uncertainty and diversity for active recognition of hazardous chemical accident entities.Firstly,this thesis uses Latent Dirichlet Allocation for topic clustering to select representative initial samples.Then,this thesis evaluates the uncertainty and diversity of samples based on their information content,diversity,and initial topic clustering information,and selects samples with higher comprehensive scores for annotation and inclusion in model training.Finally,the active learning process is terminated based on the F1 value of the model.Experimental results show that the proposed method can effectively reduce the annotation workload and improve the overall recognition F1 value.Design and implementation of hazardous chemical accident information entity recognition system.Based on the research findings of this study,this thesis designs and develops a hazardous chemical accident information entity recognition system.The system visually displays the key steps of hazardous chemical accident information entity recognition,and integrates a large number of NER algorithms for comparative validation and analysis of the research findings of this study.In addition,based on the entity recognition,a knowledge graph for hazardous chemical accidents is constructed,and a neo4 j knowledge model is designed for storage and querying of the graph. |