Font Size: a A A

Research And Implementation Of Automatic Threat Intelligence Extraction Model Based On Natural Language Processing

Posted on:2021-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:S XunFull Text:PDF
GTID:2428330632462746Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays,the rapid development of network technology is accompanied by the emergence of cyberspace security issues.However,traditional cyber attack defense solutions cannot cope with newly complicated and sophisticated cyber attack methods that cause severe network paralysis and economic losses.The emergence of cyber threat intelligence technology with attack threat information data as the core has provided security defense for increasingly tight cyberspace.Organized and planned attack methods,such as Advanced Persistent Threat(APT)and malware attacks,require security defenders to analyze and detect on-going or imminent threat attacks from widely distributed intelligence data in cyberspace,transform it into machine-readable threat intelligence and deploy it to network defense infrastructures such as intrus:ion detection systems to achieve rapid linkage response and attack defense.However,it is obviously unrealistic to obtain threat intelligence directly through human analysis of a large amount of open unstructured threat information.Therefore,how to efficiently and accurately identify and extract threat intelligence from widely distributed open source unstructured threat information is very important for cyberspace security threat detection and attack defense.This paper mainly researches the automatic extraction of threat intelligence information widely distributed in cyberspace.The main work contents are:(1)Aiming at the large amount of open source unstructured threat text information data in cyberspace,this topic builds a feature extraction model Bidirectional Encoder Representation from Transformers(BERT)based on natural language processing technology to abstract semantic text information into machine-readable feature matrix.The model can combine contextual semantic information to more accurately reflect the characteristics of threat intelligence data in the feature vector,and also further improve the training effect and performance of subsequent models.(2)Aiming at the problem of how to automatically discriminate threat intelligence information and non-threat intelligence information from open source unstructured threat text information data,this paper proposes a classification model based on the Convolutional Neural Network(CNN)algorithm.The model uses the threat intelligence feature vector matrix obtained from the BERT-based feature extraction model to train the classification model to learn potential intelligence data information features and entity associations,thereby further realizing automatic identification and extraction of threat intelligence sentences in unstructured semantic text.Through comparative experiments,this paper can prove that the proposed model is effective on the collected data set,and has better performance than the other methods in terms of accuracy,precision,recall and F1 score.At the same time,the experimental results also prove that the new classification model based on CNN proposed in this paper can significantly improve the efficiency and accuracy of threat intelligence extraction.(3)Aiming at the market requirements and business function requirements of automated detection of threat intelligence by enterprise users and ordinary users,this paper designs and implements an automated extraction system of threat intelligence.The system takes structured threat intelligence data and unstructured threat intelligence text information data that are widely distributed in cyberspace as the target,uses network distributed crawler technology for real-time collection,and uses BERT and CNN-based threat intelligence automatic extraction models to extract threat intelligence sentences in unstructured threat text information,and transform the extracted threat intelligence into machine-readable formatted threat intelligence.This model significantly improves the efficiency and accuracy of the automated extraction system for threat intelligence.The system has carried out various functional and performance tests.The test results verify that the design and implementation of the system in this paper has basically met expectations,providing a practical data guarantee for threat intelligence application and deployment.In summary,this subject has conducted in-depth research and analysis of threat intelligence for automatic extraction of objects,and explored the importance and necessity of applying natural language processing technology to analyze threat intelligence,and a feature extraction model based on BERT is built and a classification model based on CNN is proposed.The two models together form an automatic threat intelligence extraction model based on natural language processing.This model has better performance in terms of perfornance and accuracy than other mainstream models.Based on this model,this subject designs and implements a prototype system for automatic extraction of threat intelligence,which is proved by various functional and performance tests.The system has practicality and stability,and provides practical data guarantee for threat intelligence applications and deployment.
Keywords/Search Tags:threat intelligence, intelligence extraction, natural language processing, convolutional neural network
PDF Full Text Request
Related items