Font Size: a A A

Research On Key Technologies Of Cybersecurity Knowledge Graph Construction

Posted on:2022-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y NiuFull Text:PDF
GTID:2518306524984499Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet,the number of existing Internet users is huge and increase steadily.Therefore,the criticality of cyber-security technology is becoming more and more important.As a technology that can dig out effective information and semantic relations from massive and heterogeneous data,knowledge graph has become a research hotspot in recent years.This thesis aims to study several key technologies involved in the construction process of Chinese knowledge graph in cyber-security field,the key issues in the knowledge extraction stage are cyber-security named entity recognition technology and cyber-security entity relation extraction technology.These two steps are critical of information extraction task.The main contributions of this thesis are as follows.(1)In Chapter 3,this thesis firstly put forward a kind of application in the field of cyber-security of named entity recognition method with Chinese text corpus,the model studies the Chinese characters as pictograph property,using a Convolution Neural Network for feature extraction of Chinese characters' radical.At the same time,the BERT pre-training model trained by Chinese corpus is introduced to obtain the input feature representation of the neural network.The final feature representation vector will be input into Transformer which is improved for entity recognition task for feature extraction.Finally,the entity labeling result prediction of each character is obtained by combining with a CRF layer to achieve the purpose of entity recognition.In the experimental part,this thesis conducts training and performance test on multiple public Chinese named entity recognition datasets.The experimental results show that the model is effective,especially on the Weibo NER dataset.Later,the crawler technology is used to collect text corpus related to the cyber-security field,and a cyber-security entity recognition dataset is constructed on corpus for performance testing.Compared with many other models proposed in the past three years,the model also has considerable performance.(2)For entity-relation extraction task,in Chapter 4 of this thesis,based on the Chinese BERT pre-training model,a special processing is carried out on the training corpus,special symbols are used to mark the entities in the sentences.After passing the BERT model,the implicit state vector representation of the determined entities can be obtained.After that,the attention score of each component character and the whole sentence of the entity was obtained by using a self-attention mechanism and a weighted average was used to get the final vector representation of the entity.Then,the neural network structure composed of full-connection layer and activation function was used to predict the relation.The model has a simple structure,but the experimental results on public datasets and self-built intra-domain relational extraction dataset show that the model performs well,which proves the effectiveness of the model proposed in this chapter.(3)After completing named entity recognition and inter-entity relation extraction tasks in the field of cyber-security,this thesis designs and implements an automated corpus acquisition and knowledge graph construction system in Chapter 5.At the same time,this thesis builds a intra-domain information retrieval system includes simple knowledge graph query function and rule-based intelligent Questions and Answers function.
Keywords/Search Tags:Knowledge Graph, Cybersecurity, Named Entity Recognition, Relation Extraction, BERT
PDF Full Text Request
Related items