| With the continuous accumulation of Internet data,the large amount of domain information contained in the text of the material domain has become the focus of researchers.As a key field in artificial intelligence research,natural language processing(NLP)can help machines analyze and obtain important characteristics of human natural language.As the basis of natural language processing,named entity recognition(NER)can extract the information contained in the corpus concerned in a specific field.Named entity recognition in the field of materials has attracted widespread attention in the research work of material text mining due to the particularity and professionalism of its corpus.The lack of annotated corpus has become one of the biggest bottlenecks affecting the application of named entity recognition to the field of materials.Therefore,material entity recognition(MER)method research in the field of materials is of great significance.This article mainly takes the English corpus of alloy materials as the research object,and uses active learning method and pre-trained language models to label the effective expansion corpus selected from a large number of unlabeled English corpora.This method effectively reduces the cost of manual annotation,improves the generalization ability of the MER model on the alloy material corpus,and has certain universality for the task of naming entity recognition in the absence of annotation data.In addition,for the characteristics of corpus entities in the field of alloy materials,a hybrid named entity recognition method is used to further improve the accuracy of entity recognition of alloy materials.The research work in this article includes the following:(1)For the named entity recognition task in the absence of labeling data,a new model is used to assist sample selection,and the cost of manual labeling data is reduced by improving the active learning sample selection strategy.This method is applied to the entity recognition task of alloy materials.By comparing the effects of different active learning methods and text data enhancement methods on the model generalization ability,it is proved that the active learning method can improve the model generalization ability faster.(2)Based on ALBERT(A Lite BERT)pre-trained language model combined with conditional random field(CRF)for joint modeling,a named entity recognition combining pre-trained model and active learning is proposed.The results of experiments show that this method can further reduce the dependence of the entity recognition task model on labeling data and the cost of manual labeling.In addition,the alloy material entity recognition results obtained by different named entity recognition methods are merged,which further improves the accuracy of the alloy material entity recognition. |