Font Size: a A A

A Threat Intelligence Extraction System Based On Unstructured Text

Posted on:2022-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q X WangFull Text:PDF
GTID:2518306740495124Subject:Computer technology
Abstract/Summary:PDF Full Text Request
It is of great significance to security situation awareness if threat intelligence can be automatically extracted and structuralized from secrity texts which usually contains important information.As the STIX standard is most widely used for its rich types and convient sharing among various threat intelligence expression standards,the research goal of this paper is to implement a system that automatically converts unstructured security reports into structured STIX format threat intelligence.The main work of this paper is as follows:(1)The extraction target is formulated and the text preprocessing method for the PDF format security report is proposed,including format conversion,text filtering,sentence-split and tokenization.(2)Extracting methods for multi-modal STIX threat intelligence objects and attributes are proposed according to text and dataset characteristics.Regular expressions are used to extract indicators from text.As for the extraction of security entities and attack patterns,deep learning models are trained on the datasets which are generated from manual annotations and dictionary match combined.This paper also compares results between different models and uses pseudolabel method for entities with small datasets.The attributes of threat intelligence are extracted through comprehensive methods such as keywords matching,classification,and similarity comparison.(3)A method for extracting the relationship between threat intelligence objects is proposed.Relationship extraction between different threat intelligence objects is realized by transforming the problem into a classification problem of judging whether relationship exists between two threat intelligence objects in this paper.Distant supervision are used to expand the annotated dataset,and the results of classification model are improved.(4)A prototype threat intelligence extraction system is designed and constructed based on STIX standards which takes unstructured text as input and the extraction of STIX objects and relationship is realized.A website is built so the extracted information can be visualized to facilitate future analysis.An automatic extraction process of threat intelligence is implemented in this paper through various methods in the field of Natural Language Processing.Experiment results show that the system can effectively extract relevant information in security texts and generate structured threat intelligence,which effectively improves the efficiency of security analysis and plays an important role in security research.
Keywords/Search Tags:text extraction, STIX, threat intelligence, Natural Language Processing
PDF Full Text Request
Related items