The value of cyber threat intelligence in defending against cyber threats is becoming increasingly evident,and the use of threat intelligence to drive cybersecurity defense has become a widespread consensus among enterprises and organizations.In fact,collecting,analyzing and sharing threat intelligence is an effective cybersecurity defense measure that trades space for time.By proactively detecting existing or potential cyber threats and improving the response speed when faced with threats,the situation of asymmetry between cyber attack and defense can be changed to a certain extent and cyber situational awareness can be achieved.However,threat intelligence is complex in type and variable in application scenarios.With the rapid generation of information,threat intelligence is updated rapidly,posing a challenge to security personnel to analyze and exploit threat intelligence in real time.The threat intelligence released by various security vendors is usually in text form and contains a large amount of unstructured data.At the same time,the existence of redundant information and specialized domain vocabulary somewhat reduces the usability of threat intelligence.The wide application of knowledge graph in various fields provides new ideas for network intelligent defense.As one of the most effective knowledge integration methods,knowledge graph can achieve threat awareness and detect new cyber threats by efficiently visualizing security knowledge,correlating and fusing multi-source heterogeneous data,and tracing attacks back to their source.This paper extends knowledge graph to the field of cyber security,and focuses on the key technologies involved in the construction and application of the knowledge graph of cyber threat intelligence.To address the problem that existing threat intelligence is usually unstructured data from a wide range of sources,we design entity extraction and relation extraction models,improve loss functions,incorporate various features,establish knowledge graphs,and propose a knowledge query method based on Neo4 j graph database.The main work of this paper is as follows.1.A method for extracting cyber threat intelligence entities by fusing Focal Loss is proposed.Cyber threat intelligence contains rich threat intelligence knowledge,mostly in the form of natural language text.Extracting key elements of it is an important basis for constrcuting knowledge graph.However,threat intelligence text usually includes words with high domain characteristics such as cyber attack types,attack implementation means and cybercriminal organizations.and the number of sample labels varies,which makes the existing entity extraction methods unable to achieve satisfactory results.For this reason,word and character features are added to the model to address the problem of specialized vocabulary for threat intelligence.Meanwhile,in order to alleviate the problem of limited performance of classes with only a few samples,an entity extraction model incorporating Focal Loss is proposed to introduce balancing factors and modulation coefficients to balance the ratio of positive and negative samples,increase the loss weight of difficult samples,and improve the performance of threat intelligence entity extraction.2.A feature-enhanced relation extraction approach for document-level cyber threat intelligence is proposed.Relation extraction plays an important role in mining relationships between key threat elements of threat intelligence and constructing threat intelligence knowledge graph;however,existing relationship extraction models face challenges in threat intelligence domain.To solve the problem of lack of open source datasets,threat intelligence is collected from blogs,forums and other websites and manually labeled.We build a threat intelligence ontology to standardize entities and relationships in knowledge graph.To address the problem of complex structure of threat intelligence documents,a feature-enhanced document-level relation extraction model is designed to make full use of the information in the documents.Meanwhile,a teacher-student model is introduced to realize knowledge distillation.An oversampling method is used to alleviate the threat intelligence sample imbalance problem and substantially improve the model performance compared with mainstream models.3.A method for extracting cyber threat intelligence information that incorporates multiple models is proposed.Entities in threat intelligence are scattered in the whole article and have intricate relationships with each other.Manual analysis is time-consuming and labor-intensive,which is difficult to follow up and update in real time.To this end,this paper proposes a novel threat intelligence information extraction method combining multiple models,which organizes scattered,multi-source heterogeneous security data and mainly contains four key steps: entity extraction,coreference resolution,relation extraction and knowledge graph construction.In the entity extraction task,different words contribute to the discrimination of entities to different degrees,and a selfattention mechanism is introduced to obtain vector representations important to entities.In the coreference resolution task,combining contextual information with mention embedding,convolutional neural network is introduced to extract local features and fuse them with global features to enhance the representation.In the relation extraction task,various features such as lexicality and width are incorporated to enhance the embedding representation.Extract structured triple and populate it into knowledge graph to show the entities and their inter-relationships.4.A knowledge retrieval system based on the knowledge graph of cyber threat intelligence is developed.Threat intelligence contains rich knowledge scattered in various locations of the text,which brings challenges to information retrieval.To efficiently acquire the knowledge among them,a knowledge retrieval system based on the knowledge graph of cyber threat intelligence is developed,which can analyze and process four types of natural language questions including attribute query class,node query class,reverse query class,and attribute comparison class,and design corresponding templates for different intents to accept natural language questions from users.Corresponding templates are designed for different intents to accept the natural language questions from users.The questions are converted into Cypher query statements and input into the Neo4 j graph database for retrieval results to generate human-readable natural language answers,thus simplifying the search process and reducing the difficulty of acquiring fragmented threat intelligence knowledge. |