Font Size: a A A

Research On Method And Application Of Named And Terminology Entity Recognition

Posted on:2022-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:H T FanFull Text:PDF
GTID:2518306353984579Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity recognition refers to the recognition of words with special meanings in the text.The accuracy of named entity recognition also has an important impact on downstream information retrieval,syntactic analysis,semantic analysis,relationship discovery and other tasks.The research of named entity recognition method mainly focuses on entity recognition in open domain text.This article mainly studies the named and terminology entity recognition methods that appear in the policy and regulation texts of the Social insurance and housing fund.With the development of society,the system becomes more complete,As the Social insurance and housing fund closely related to people,the amount of data on policies and regulations will also increase day by day.Manually sorting out related entities in this field takes time and effort.When using traditional methods to recognition domain entities,excessive reliance on manual construction of features is not conducive to promoting the development of informatization in the Social insurance and housing fund.Therefore,according to the structural characteristics of the policy and regulation text in the Social insurance and housing fund,this paper proposes entity recognition model based on deep learning neural network.And through the pre-trained language model uses unlabeled data to enhance the semantic features of the characters in the input layer of the model,thereby further improving the recognition effect of the model.The main contributions of this paper are as follows:(1)According to the text structure characteristics of the Social insurance and housing fund,the basis for the classification of entity types in the field is proposed,and the field entity discovery algorithm based on the combination of part of speech rules is proposed.(2)Since the texts of the Social insurance and housing fund are generally long sentences,The semantic vector after only using the long and short-term memory network(Bi LSTM)to encode the character context information is not enough to contain the entire context information,which will cause part of the semantic information to be lost.This paper proposes an entity recognition model in the field of the Social insurance and housing fund that integrates the attention mechanism on the Bi LSTM+CRF basic model.Through the attention mechanism,the context semantic vector of each character in the sentence is dynamically combined according to the importance of the current character annotation,so as to obtain a more comprehensive semantic vector.The experimental results show that after adding the attention mechanism,it is compared with the Bi LSTM+CRF model.The accuracy rate,recall rate,and F1 value have been significantly improved.(3)The use of deep learning methods for domain entity recognition reduces the defect of manual feature construction,but still requires manual data annotation,and often only a relatively small part of the data can be marked.For the small amount of existing annotation data,This paper is based on the pre-training language model method using unlabeled Social insurance and housing fund policy and regulation text to pretrain a character-level two-way language model.The character vector with contextual semantics obtained through the language model is used as the input layer feature of the entity recognition model to further improve the recognition effect of the model.
Keywords/Search Tags:Named entity recognition, Social insurance, Attention, Pre-trained language model
PDF Full Text Request
Related items