| Cybersecurity attacks are proliferating,and organizations such as governments and enterprises are frequently threatened by security incidents.Threat intelligence unstructured text contains physical information about attack events,and the extracted entities can be used to strengthen response means and shorten defense response time.In recent years,deep learning models have achieved good results in the task of named entity recognition in the threat intelligence domain,while deep learning models rely on a large amount of annotated data,which is costly in the threat intelligence domain,and a small amount of annotated data makes it difficult for the models to achieve satisfactory results.Aiming at the above problems,this thesis improve the named entity recognition model in the few-shot learning threat intelligence domain from both model architecture and multi-view learning,and propose a series of solutions to address the above problems.(1)For the problem of small amount of annotated data in the threat intelligence domain,it is difficult for a single few-shot learning model to fully learn the data features,this thesis proposes a new Few-shot Threat Intelligence Named Entity Recognition Model(FTM).The FTM model is co-trained with the prototype network,the pre-training model and the self-training model by the Tri-training algorithm,and exploits the complementary nature of the three different model views to capture more threat intelligence domain knowledge at the encoding level.In experiments conducted on the threat intelligence dataset,the F1 score of the model test in the 5-way 10-shot training scenario is 44.56%,which is at least 8.69% better than any single internal model,and the FTM model outperforms other single or joint models in all three few-shot scenarios.(2)To address the redundancy of encoding information in the multi-view learning process of the FTM model proposed in this thesis,which affects the weight of features in the loss and leads to training errors,this thesis simplifies the structure of the Gate Recurrent Unit(GRU)model.Based on the improved GRU structure,the FTM model is changed to a three-view fusion approach,and the Few-shot Threat Intelligence Named Entity Recognition Model Based on Improved GRU Fusion(FTM-GRU)is proposed.GRU gating units and view correlation calculations determine to memory and forget of threat intelligence features and highlight important semantic features.Experiments are conducted on two threat intelligence datasets,and the 5-way 10-shot training scenarios improve the F1 values by 4.56% and 4.29%,respectively,relative to the FTM model,and the FTM-GRU model performance metrics outperform the FTM model in other few-shot scenarios.(3)Faced with a large amount of unstructured threat intelligence data,there is a lack of a tool that can easily and quickly extract threat intelligence named entities.Based on this background,a threat intelligence named entity recognition system is constructed,which integrates the few-shot threat intelligence named entity recognition model proposed in this thesis to realize batch operations on unstructured text and effectively improve the efficiency of threat intelligence named entity recognition. |