Research On Named Entity Recognition Algorithm In Mathematics Field

Posted on:2022-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:X T Zhang

Full Text:PDF

GTID:2518306329989619

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

In the tasks of natural language processing,named entity recognition is very important as a basic task,and its accuracy determines the effect of subsequent tasks.At present,the research of named entity recognition algorithms is mostly limited to news field.The main entities recognized are person names,location names and so on.Although good results have been achieved,in the practical application process,we prefer domain specific named entities.This paper explores the named entity recognition algorithm in mathematics field.The named entity recognition algorithm in mathematics field refers to recognise mathematical proper nouns from mathematical scientific papers,which is the basic work of mining the knowledge we need from the massive literature.At present,the mainstream neural network model of named entity recognition is Bi-LSTM – CRF.It is found that the F1-score is84.74% when the Bi-LSTM – CRF is applied to our task,while the F1-score of named entity recognition in news field is 91.35%.Therefore,this paper modifies the network structure of Bi-LSTM – CRF model to recognize mathematical named entities better.Firstly,due to the problem that word embedding can not reflect the polysemy of words in traditional text representation,the pre-training language model SCI-BERT is introduced into the model,modeling for word embedding in the sense of mathematical scientific papers.Secondly,in order to recognize the named entity boundary better,we use mathematical domain dictionary to construct lexical boundary features,so that the word embedding contains the position information.Finally,we use stack neural network instead of the single layer neural network,hence the feature of mathematical named entity can be better fitted by deepening the layer number of neural network.Because of the scarcity of tagging corpus for our task,we build a corpus before modifying the model.In this paper,we use the mathematical domain dictionary which is built by myself to annotate the mathematical corpus based on the maximum positive matching algorithm.The data set used in this paper is the above mathematical corpus.In this paper,we proposes a neural network model SCI-BERT – Bi-LSTM – CRF,and the experimental results show that our model performs better than the model Bi-LSTM –CRF in research effect.The F1-score of our task is increased from 84.74% to 90.02%.In addition,we also introduce the application of our model in the writing and the classification of mathematical scientific papers.

Keywords/Search Tags:

named entity recognition, mathematical named entity, CRF, BERT, SCI-BERT

PDF Full Text Request

Related items

1	Research On Chinese Named Entity Recognition Based On BERT
2	Research On Bert-based Named Entity Recognition
3	Named Entity Recognition Algorithm Based On BERT And Semantic Relevance
4	Research On Chinese Entity Recognition Based On BERT
5	Research On Named Entity Recognition Algorithm And Its Implement In Specific Fields
6	Research On Chinese Named Entity Recognition Based On Deep Neural Network
7	Chinese Named Entity Recognition Based On Deep Learning
8	Named Entity Recognition Of Middle School Mathematics Knowledge Based On Deep Learning
9	Research On Chinese Named Entity Recognition Model Based On Deep Learning
10	Research On Chinese Named Entity Recognition Based On BERT-BLSTM-CRF Model