Research On Key Technologies Of Named Entity Recognition For Rail Transit Code

Posted on:2022-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Fang

Full Text:PDF

GTID:2492306512976499

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As the basic support of building domain knowledge graph,Named Entity Recognition is mainly focused on the open domain.The research progress of specific domain is relatively slow,which can not effectively solve the vertical domain problems caused by low resources,no complete domain dictionary and lack of entity classification system.Based on the code for design of metro engineering GB53157-2013,the data set of Named Entity Recognition task is constructed.The specific research contents are as follows.（1）This thesis is based on RoBERTa₈00K-CRF in the case of low resources realizes domain adaptive pre-training.Firstly,BiLSTM-CRF is used as the baseline model,and the results are compared with the best model with different parameters through two kinds of classification granularity data sets.Finally,the 800K text corpus related to the construction field is collected and optimized on the basis of RoBERTa-wwm to realize the adaptive pre-training in the construction field.The final F1 score of Named Entity Recognition can reach 61.71%,which is 15.34%higher than the baseline model.（2）This thesis proposes MT-CAT-RailBERT multi-task learning algorithm based on topic classification method,which enhances the data of topic classification direction by catalog data,so as to optimize the training efficiency.In this thesis,according to the directory level information in the specification,the chapter name and section name of each specification text are embedded as the category information of topic classification,so as to build a classification data set.Based on the pre-training language model,a variety of classification algorithms are used to train it,so as to replace the domain adaptive pre-training.The experimental results show that the F1 score of Named Entity Recognition is improved to 62.52%,and the training time is shortened to 1/40 of the domain adaptive pre training method.（3）In this thesis,a unified annotation platform for standard text is developed based on Spring Boot framework,and the graph visualization is realized by node level calculation method.The platform includes standard query,standard annotation and audit,term dictionary management and other core functional modules,which can realize the complete construction of domain knowledge graph and the automatic expansion of named entity recognition data set.In summary,the key technologies such as pre-train language model,domain adaptive training,topic classification method and multi-task learning are used to complete the task of Named Entity Recognition for specific domain.At the same time,in order to solve the problem of annotation inconsistency caused by complex specification semantics,this paper develops a unified annotation platform for specification text,which can realize the complete construction of domain knowledge graph and the visualization of the graph.

Keywords/Search Tags:

Named entity recognition, Design code, Deep learning, PTMs, Multi-task learning

PDF Full Text Request

Related items

1	Research On Named Entity Recognition Method Of Rail Transit Code
2	Process Named Entity Recognize Method Based On Deep Learning
3	Research On New Energy Vechicles Named Entity Recognition Based On Multi-feature
4	Research On Named Entity Recognition In Clock Domain Based On Deep Learning
5	Research On Named Entity Recognition Method In Civil Aviation Business
6	Named Entity Recognition Of The Code For Geology Investigation Of Railway Engineering
7	Research On Name Entity Recognition Model Of Power Equipment Defects Text Based On Deep Learning
8	Research On Text Mining Of On-Board Signal Equipment Maintenance Log Based On Deep Learning
9	Weak Supervised Remote Sensing Image Application Case Named Entity Recognition For Academic Texts
10	Research On Key Technologies Of Bridge Inspection Text Information Extraction Based On Deep Neural Networks