| In the tasks of natural language processing,named entity recognition is very important as a basic task,and its accuracy determines the effect of subsequent tasks.At present,the research of named entity recognition algorithms is mostly limited to news field.The main entities recognized are person names,location names and so on.Although good results have been achieved,in the practical application process,we prefer domain specific named entities.This paper explores the named entity recognition algorithm in mathematics field.The named entity recognition algorithm in mathematics field refers to recognise mathematical proper nouns from mathematical scientific papers,which is the basic work of mining the knowledge we need from the massive literature.At present,the mainstream neural network model of named entity recognition is Bi-LSTM – CRF.It is found that the F1-score is84.74% when the Bi-LSTM – CRF is applied to our task,while the F1-score of named entity recognition in news field is 91.35%.Therefore,this paper modifies the network structure of Bi-LSTM – CRF model to recognize mathematical named entities better.Firstly,due to the problem that word embedding can not reflect the polysemy of words in traditional text representation,the pre-training language model SCI-BERT is introduced into the model,modeling for word embedding in the sense of mathematical scientific papers.Secondly,in order to recognize the named entity boundary better,we use mathematical domain dictionary to construct lexical boundary features,so that the word embedding contains the position information.Finally,we use stack neural network instead of the single layer neural network,hence the feature of mathematical named entity can be better fitted by deepening the layer number of neural network.Because of the scarcity of tagging corpus for our task,we build a corpus before modifying the model.In this paper,we use the mathematical domain dictionary which is built by myself to annotate the mathematical corpus based on the maximum positive matching algorithm.The data set used in this paper is the above mathematical corpus.In this paper,we proposes a neural network model SCI-BERT – Bi-LSTM – CRF,and the experimental results show that our model performs better than the model Bi-LSTM –CRF in research effect.The F1-score of our task is increased from 84.74% to 90.02%.In addition,we also introduce the application of our model in the writing and the classification of mathematical scientific papers. |