| Ancient Chinese named entity recognition is a key area of ancient Chinese natural language processing research that aims to extract entities such as people,places,institutions,officials,and other types from historical texts.As a fundamental task in ancient Chinese natural language processing,ancient Chinese named entity recognition assumes an important role in ancient Chinese natural language processing tasks such as ancient Chinese information extraction and ancient Chinese knowledge graph construction.In recent years,ancient Chinese named entity recognition has developed rapidly,but the following problems still exist in this field:(1)Many existing methods rely on single-level features and fail to consider both word structure and character-word features,leading to inadequate information about lexical and structural characteristics;(2)Chinese character pronunciation contains semantic information that is not fully utilized by current methods;(3)The limited availability and high cost of annotated datasets pose significant limitations for the performance of deep learning algorithms in this field.To address these challenges,this paper proposes investigating feature fusion mechanisms and data enhancement methods for ancient Chinese named entity recognition,with the aim of improving model performance and reducing the data annotation workload.Our contributions include:(1)To address the problem that most of the existing ancient Chinese named entity recognition methods use only single-level features,this paper proposes a structure-lattice feature fusion network(SLFFN)that fuses character structure features and character-word features,which improves the model performance by fusing character-word features with character structure features of Chinese characters to obtain lexical information and structural characteristics simultaneously.The experimental results show that SLFFN outperforms the baseline model on the open dataset C-CLUE.(2)To address the problem that the existing methods for recognizing ancient Chinese named entities ignore the word pronunciation features,this paper fuse the word pronunciation features based on SLFFN and proposes a muti-level feature fusion network(MFFN).By fusing character-word features with word structure features and word pronunciation features,the model is able to obtain lexical information,structural characteristics and word pronunciation information simultaneously.To further validate the effectiveness of the model,two ancient Chinese entity recognition datasets are constructed in this paper,and experiments are conducted on these two datasets and the C-CLUE dataset.The experimental results show that both SLFFN and MFFN outperform the baseline models on these three datasets,and MFFN achieves better entity recognition results than SLFFN.(3)To address the problem of high cost of annotating ancient Chinese entity recognition datasets,this paper proposes two data enhancement methods for generating new data for model training,namely the random entity replacement method and the random sentence generation method.The random entity replacement method generates data containing new entities by replacing the same type of entities in the data.The random sentence generation method generates new sentences by swapping the preceding and following sub sentence of adjacent sentences.The experimental results show that the two data enhancement methods proposed in this paper can improve the model entity recognition and can reduce the data annotation workload. |