Ancient Chinese is the key to open ancient texts and dialogues with ancient civilization,which contains precious value.How to use the computer to automatically annotate the knowledge points of Ancient Chinese and help us learn Ancient Chinese knowledge is the focus of research.Based on the analysis of the characteristics of Ancient Chinese corpus,this paper studies the technology of Named Entity Recognition and improves some problems existing in the task of Named Entity Recognition in Ancient Chinese.The main contents are as follows:1.In view of the problems such as the small scale and poor quality of the existing open source data set of Ancient Chinese,firstly,this paper checks and improves the obtained open source data set C-CLUE;Secondly,a large scale Ancient Chinese data set History-NER is constructed.The original corpus of this data set comes from the ancient literature reading website of Zhuzhige.After processing the acquired original corpus,it is divided into training set,verification set and test set according to a certain proportion,and five representative types of entities are labeled for the subsequent Named Entity Recognition experiment.2.Aiming at the problems that traditional Named Entity Recognition methods cannot fully learn the complex sentence structure information of Ancient Chinese and are easy to cause information loss when acquiring long sequence features,a method combining Siku BERT model and Multi-Head Attention mechanism is proposed.Firstly,Siku BERT is used to pre-train the input text.Secondly,the word vector containing text rich information obtained through pretraining is input into the bidirectional long and short term memory network for feature extraction.The extracted features are assigned different weights by Multi-Head Attention mechanism to reduce information loss.Finally,the predicted tag sequence is obtained by conditional random field decoding.Experiments show that the F1 value of the proposed method is significantly improved compared with the common models,which verifies the effectiveness of the proposed method.3.The character-level Named Entity Recognition model does not make full use of lexical information and overfitting may occur when identifying small-scale entities due to the complex structure of the model,a Named Entity Recognition method combining word information is proposed.Firstly,the Ancient Chinese dictionary is constructed with Word2 vec.Secondly,the word vector output from the Siku BERT model is combined with the potential words matched from the dictionary and input into the Flat-Lattice Transformer model to complete the encoding.Finally,the optimal result is obtained through conditional random field decoding.Experiments show that the recognition effect of this method is better than other baseline models,which verifies that this method can effectively improve the performance of model recognition.To sum up,the Named Entity Recognition method proposed in this paper for the field of Ancient Chinese has achieved good results on both datasets,filling in the lack of natural language processing research in the field of Ancient Chinese to a certain extent,and will also have a certain impact on the inheritance and development of Ancient Chinese classics. |