Font Size: a A A

Research On Key Technologies Of Named Entity Recognition For Rail Transit Code

Posted on:2022-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:X Y FangFull Text:PDF
GTID:2492306512976499Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the basic support of building domain knowledge graph,Named Entity Recognition is mainly focused on the open domain.The research progress of specific domain is relatively slow,which can not effectively solve the vertical domain problems caused by low resources,no complete domain dictionary and lack of entity classification system.Based on the code for design of metro engineering GB53157-2013,the data set of Named Entity Recognition task is constructed.The specific research contents are as follows.(1)This thesis is based on RoBERTa800K-CRF in the case of low resources realizes domain adaptive pre-training.Firstly,BiLSTM-CRF is used as the baseline model,and the results are compared with the best model with different parameters through two kinds of classification granularity data sets.Finally,the 800K text corpus related to the construction field is collected and optimized on the basis of RoBERTa-wwm to realize the adaptive pre-training in the construction field.The final F1 score of Named Entity Recognition can reach 61.71%,which is 15.34%higher than the baseline model.(2)This thesis proposes MT-CAT-RailBERT multi-task learning algorithm based on topic classification method,which enhances the data of topic classification direction by catalog data,so as to optimize the training efficiency.In this thesis,according to the directory level information in the specification,the chapter name and section name of each specification text are embedded as the category information of topic classification,so as to build a classification data set.Based on the pre-training language model,a variety of classification algorithms are used to train it,so as to replace the domain adaptive pre-training.The experimental results show that the F1 score of Named Entity Recognition is improved to 62.52%,and the training time is shortened to 1/40 of the domain adaptive pre training method.(3)In this thesis,a unified annotation platform for standard text is developed based on Spring Boot framework,and the graph visualization is realized by node level calculation method.The platform includes standard query,standard annotation and audit,term dictionary management and other core functional modules,which can realize the complete construction of domain knowledge graph and the automatic expansion of named entity recognition data set.In summary,the key technologies such as pre-train language model,domain adaptive training,topic classification method and multi-task learning are used to complete the task of Named Entity Recognition for specific domain.At the same time,in order to solve the problem of annotation inconsistency caused by complex specification semantics,this paper develops a unified annotation platform for specification text,which can realize the complete construction of domain knowledge graph and the visualization of the graph.
Keywords/Search Tags:Named entity recognition, Design code, Deep learning, PTMs, Multi-task learning
PDF Full Text Request
Related items