| In recent years,more unscrupulous elements have been building botnets or carrying out other malicious attacks through malicious domains.Therefore,how to accurately and efficiently detect malicious domains has become a hot research topic in network security.The current mainstream malicious domain detection system mainly uses machine learning methods to learn the character features of malicious domains or build domain relationship graphs based on DNS traffic combined with graph embedding methods.However,the above methods do not take the malicious behavior features of malicious domains into account,so they are easily circumvented by technologies such as Domain-Flux or IP-Flux.In order to learn the rich behavioral information of malicious domains,this paper introduces sandbox traffic into the field of malicious domain detection.In this paper,we obtain sandbox traffic data of malicious files and extract domain behavior related information from them modeled as a heterogeneous graph.We study the malicious domain detection method based on the heterogeneous graph combined with graph neural network algorithm.The specific research of this paper is as follows:1.In this paper,we obtain sandbox traffic data of malicious file samples and extract domain behavior-related information from them to model a heterogeneous graph to learn rich behavioral information of malicious domains.We analyze the characteristics of file sandbox traffic and design a knowledge representation model of domain related behavioral information,including node entities,relationships,attributes and other information extraction scheme research.We finally extracted four types of nodes: domain,file,url and IP and the edges corresponding to them,and constructed a heterogeneous graph of domain-related behavioral information.2.A heterogeneous graph-based malicious domain name detection method is proposed to fully learn the various types of node information and relationship information of the heterogeneous graph constructed in this paper.Neighbor sampling algorithm and information aggregation are used to learn the heterogeneous graph embedding information of the target nodes.In order to adapt the algorithm to the training of large graphs with limited resources,a combination of neighbor sampling and batch training is used for training.Ultimately,the heterogeneous graph based on the constructed domain related behavioral information is finally implemented in limited resources and limited time to detect malicious domains.The experiments demonstrate the feasibility of the heterogeneous graph constructed based on the file sandbox traffic information to detect malicious domains.3.An attention mechanism is added to the graph neural network model based on the previously proposed domain detection method to learn the effect of nodes and relationships on malicious domains.Attention mechanisms are designed mainly for nodes and edges respectively.The node attention mainly learns the influence of neighboring node features on the target node,while the edge attention mechanism mainly learns the influence of edge features on the target node.In this paper,as the original edges in the graph are used in the aggregation process,all the edge information in the graph is fully utilized,and artificial interference can be added in the computation process to realize the control of the learning direction of the edges to improve the detection efficiency.In this paper,we implement the proposed malicious domain detection method and experimentally verify the feasibility of the scheme on the constructed heterogeneous graph of domain related behavioral information.In experiments,the accuracy,recall,AUC,and F1-score of the domain detection model are evaluated in multiple aspects,and the effectiveness of the scheme is demonstrated by a cross-sectional comparison with existing models. |