Font Size: a A A

Malware Detection Based On Graph Representation Learning

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:H PengFull Text:PDF
GTID:2428330620464273Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the popularity of computers and the rapid development of the Internet,the number and variety of malwares are increasing,which brings unprecedented challenges to malware detection.Existing detection methods can be divided into detection based on statistical features,text semantics,and behavioral graphs.Although these methods can detect malware,they all have their own shortcomings.The statistical feature-based method relies heavily on expert experience for feature code extraction,and it is difficult to identify malware that has been obfuscated and packed.The text-semantic-based method requires too much hardware performance and cannot process too long sequence data,leading to malware can bypass the sequence detection window easily;the behavior graph-based method relies on complex graph matching algorithms and deep learning models,which is not interpretable and cannot deal with malware using advanced countermeasure techniques.Therefore,how to weaken the artificial feature extraction and improve the detection effect of malware is an urgent problem.Aiming at the shortcomings of malware detection methods,this thesis studies the relevant knowledge of malware,including malware definitions,development trends,detection and countermeasure technologies,and typical operating behaviors during malware execution.Based on this,this thesis combines the research results in current deep learning field,and a malware detection method based on graph representation learning technology is proposed.This method transforms the function call sequence into a function call graph,and combines multiple methods to extract the coding features,functional features,and behavior characteristics of each node,so as to convert the function call graph into a feature function call graph,and then use the graph to represent the learning algorithm to the features.The function call graph performs embedded representation learning based on neighborhood hopping aggregation,and finally the graph node embedded representation is input to a fully connected neural network for classification training.At the same time,aiming at the defect that malware detection is not highly interpretable,this thesis first introduced convolution visualization technology in the field of image recognition to improve the interpretability of the malware detection model based on graph representation learning.Project to the graph embedding representation layer to find the function node embedding representation that has the greatest influence on the target category,so as to locate the key functions and related calling behaviors of the malware.This thesis compares mainstream malware detection methods with graph-based learning malware detection methods,and conducts experimental analysis on public data sets.Evaluation indicators such as accuracy,recall,and F1-score prove that this method can greatly improve detection effect and does not rely too much on artificial feature extraction.At the same time,the benign samples,malicious samples and WannaCry ransomware in the real environment were randomly selected and tested for interpretation.The analysis showed that the interpretability method proposed in this thesis can greatly improve the interpretability of the malware detection model.
Keywords/Search Tags:malware detection, graph representation learning, machine learning, convolution visualization, interpretability
PDF Full Text Request
Related items