Font Size: a A A

Research On Animal Science Domain Named Entity Recognition Based On BERT Pre-training Model

Posted on:2023-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:F H YangFull Text:PDF
GTID:2543306803462724Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the promotion of "new agricultural science" construction and the development of agricultural information technology,animal science profession has been developing rapidly Many animal science workers ask questions and acquire knowledge through the Internet.Named entity recognition is a core fundamental technology in the field of natural language processing,which can identify entities and obtain useful information from various kinds of unstructured question and answer data,and then build question and answer systems,knowledge graphs and other applications for workers in the animal science field.Although named entity recognition has been applied in many fields of Chinese,it ignores that many Chinese characters have multiple meanings,and the word vectors obtained by traditional word embedding techniques cannot show such multiple meanings of words.In addition,the development of named entity recognition in animal science is slow due to the strong specialization of the field and the lack of annotation data required for entity recognition.In this paper,we create a corpus of animal science domain and construct a new entity recognition model to be applied to this corpus.The main research contents are as follows.(1)The base text of the corpus is composed of Chinese literature related to animal science domain which is obtained from the Internet.After pre-processing and cleaning the base text,the corpus is annotated with the "BIO"(B-begin,I-inside,O-outside)annotation model by using a corpus annotation tool.Thus we build a corpus of animal science domain.(2)Based on the BERT pre-training model,the commonly used LSTM-CRF named entity recognition model is improved by introducing a bidirectional long and short-term memory network.We construct a BERT-Bi LSTM-CRF model based on the BERT pre-training model.It firstly uses the BERT pre-training model to obtain a word vector representation with contextual semantic information to effectively solve the problem of multiple meanings of a word.Then the word vector representation is input to the bidirectional long and short-term memory network layer for context encoding to improve the recognition accuracy.Finally the optimal recognition effect is obtained through the conditional random field.(3)The model is experimented on the created corpus of animal science domain and compared with RNN-CRF,LSTM-CRF,Bi LSTM-CRF and BERT-CRF models.The results show that the proposed model outperforms the other models in terms of accuracy,recall and F1-score of entity recognition,proving the effectiveness of the model.
Keywords/Search Tags:Named Entity Recognition, Animal Science Domain, Bi-directional LSTM, BERT, Conditional Random Field
PDF Full Text Request
Related items