| As a fundamental task in natural language processing,named entity recognition is a hot topic of research in fields such as finance,medicine and biology,and also plays an important role in information extraction,machine translation,semantic analysis and other work.This thesis aims at the application requirements of entity recognition for retrieved texts in intelligent search engines in the security industry,and investigates how to identify related entities in the security field from the search content input by users.The main work of this thesis is as follows:1.Aiming at the lack of relevant tagging corpus resources in the security field,a complete and practical training corpus generation scheme is designed.Based on extracting the abstract user expression sentence templates and collecting the data sets of the required recognition entities,the training corpus is generated by means of slot-filling,and the information retrieval corpus in the security field is constructed.2.To address the characteristics of a wide variety of entities in the security field,and the difficulty of finding generic features for complex user expressions,a highperformance named entity recognition model TBBC(Tradaboost-BERT-BiLSTMCRF)is designed.Based on the BERT word vector,the model uses BiLSTM and CRF for feature extraction,and at the same time,combined with the Tradaboost update strategy to reduce the difference between the self-built corpus and the real user input dataset during model training,achieving high recognition accuracy in the absence of sufficient real data samples.3.Model super-parameter control experiments,multi-model control experiments and ablation experiments were carried out.The super-parameter control experiments find the appropriate super-parameters for the model;the multi-model control experiments compare the performance of different deep learning models in named entity recognition tasks in the security field;and the ablation experiments verifie the effectiveness of pre-training and fine-tuning mode,the effectiveness of the self-built corpus and the effectiveness of transfer learning updating strategy.The experiments show that compared with the ALBERT-BiLSTM-CRF model,which currently performs well in named entity recognition tasks,the TBBC model improves the accuracy,recall rate,and F1 value by 9.8%,9.7% and 9.8%,respectively.4.The constructed TBBC model is applied to practical engineering projects,and a named entity recognition service for security field text is designed and implemented,which can quickly and accurately identify security field entities from the retrieved text input by users,and standardize the identified time and address entities. |