| Named entity recognition technology is now relatively mature,but its application generalization effect is limited by the availability of datasets only in industry domains such as legal documents and electronic medical records,while research that meets the needs of big data analysis of science associations is still in the exploration stage.In response to automatic acquisition of activity entity types of Association for Science and Technology,a small sample named entity recognition method with BERT-Bi LSTM-CRF and ALBERT-Bi GRU-Attention-CRF model based on transfer learning was proposed.Firstly,the small sample dataset was constructed by crawling the data of science association activities from the official websites of science associations at all levels of the country using crawler technology.Secondly,different models were selected for training.The model was adopted for application with sufficient computing resources,which combining Bidirectional Encoder Representations from Transformers,Bidirectional Long Short Term Memory(Bi LSTM)and Conditional Random Field the BERT model with large parameters was used to generate the character vector,Bi LSTM learned the full-text features,and CRF added constraints to the output vector.Otherwise,computing resources was insufficient,the model which combining ALBERT,Bi-Gated Recurrent Unit,Attention and CRF,the lightweight ALBERT was used to generate the word embedding vector instead of BERT,Bi GRU with strong generalization ability obtained the context features,the multi-head self-attention mechanism expanded the attention to different locations,CRF made the output more standardized.Finally,obtained the annotation sequence results.The experimental results showed that compared with Bi LSTM-CRF,the F1 value of BERTBi LSTM-CRF model was increased by 6.79%,and compared with Bi GRU-CRF,the F1 value of ALBERT-Bi GRU-Attention-CRF model was increased by 1.30%.The feasibility of the proposed method was further validated by analysing the identification results of various types of entities using the model through the typical data of the project from 2015-2019.As a result,in the evaluation of the implementation of the key points of policy reform of Association for Science and Technology,he proposed method can automatically identify the types of activity entities,reduce manual participation,improve processing efficiency,and provide method support for the informatization of policy research of the Association for science and technology.The paper has 34 figures,16 tables and 55 rederences. |