Font Size: a A A

Research And Implementation Of Malware Family Classification Based On Graph Similarity

Posted on:2023-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:R Z XiaFull Text:PDF
GTID:2568306914979119Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
In the era of the rapid development of the information society where the popularization of computer technology and the Internet have become more and more closely related to all aspects of people’s lives,malware,as one of the main sources of cyberspace security threats,relies on sample variation and family derivation to continue.It is aimed at the information systems of individuals,enterprises and state agencies,and has shown a development trend of high pertinence and wide coverage.Therefore,malware similarity analysis technology plays an irreplaceable role in the increasingly severe cyberspace security situation.Existing malware analysis techniques include dynamic feature extraction and static feature extraction,which can be divided into the original bytecode level,the assembly code level,and the function call level according to the different levels of research objects.Most of the current research takes the sequence features in the original bytecode and the sequence features in the assembly code as the starting point for malware detection or classification,but these methods ignore the hidden program structure features of the malware.Therefore,this paper takes the homology and similarity of malware families as the research basis,and the control flow graph similarity at the malware assembly level as the research basis,introduces the structural characteristics of malware,and designs and implements the malware based on graph neural network.Software family classification model.The main work of this paper includes:(1)Realize the feature representation of malware samples by combining control flow graph structure features and assembly statement sequence features within basic blocks.The control flow graph extraction and local structure optimization of the disassembly code are completed,and according to the difference in the frequency of assembly statements between different families,the node features of the control flow graph are represented in the form of uniform length vectors based on TF-IDF.(2)Using the marginalized graph kernel method to verify the similarity between the malware family and the malware family on the graph feature data.The similarity calculation is carried out for the structural features of the control flow graph,the control flow graph features only containing the node sequence features,and the control flow graph features containing the complete node features,so as to verify the effectiveness of each feature in the classification task.(3)Design and implement graph data based on the combination of control flow graph structure features and node features of basic block assembly statement sequences,using a graph neural network model of a hierarchical pooling structure model to perform graph classification tasks,based on malware family similarity Malware family classification model.The GraphSAGE convolutional layer and TopK pooling layer are used in the classification model,taking into account the similarity calculation of large-scale graphs and lower computing power usage.The classification effect of the model is improved by 0.2%compared with the grayscale feature based on the original bytecode,and is improved by 1%-2%compared with the traditional single sequence feature.
Keywords/Search Tags:assembly instruction, control flow graph(CFG), malware family classification, similarity analysis, graph similarity
PDF Full Text Request
Related items