Font Size: a A A

Research On Malicious Code Family Classification Method Based On Deep Learning

Posted on:2023-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:M Z ZhangFull Text:PDF
GTID:2568307298955369Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
The rapid increase and wanton spread of PE(Portable Executable)malware has caused a serious threat to the majority of Windows users.Traditional detection methods are difficult to accurately detect malware family types for targeted prevention.In response to this problem,this thesis uses deep learning methods such as pure convolutional neural networks,bidirectional long short-term memory networks,and attention mechanisms to improve the classification performance of malware families.The specific work of this thesis is as follows:(1)Aiming at the problem that a single grayscale image feature cannot effectively represent the original information of malware,this thesis proposes a classification model combining multi-channel visualization of PE software and convolutional neural network.Binary bytecodes,bytecode word vectors,and opcode word vectors are combined into RGB three-channel images,and an improved pure convolutional neural network(Conv Ne Xt)is used to perform family classification of malicious images.The classification accuracy of RGB malicious images reaches 99.2%,which is better than other feature extraction methods.The Conv Ne Xt network also has a 2-4% performance improvement in precision and recall.(2)Aiming at the problem that the dynamic features are too large and the hidden information is ignored,this thesis proposes a CBLA(Text CNN-Bi LSTM-Attention)classification model based on the API sequence.The API2 Vec embedding and implicit semantic chain are used to characterize the internal relationship of the function,the API sequence relationship is captured through the bidirectional long short-term memory network(Bi LSTM),and the attention mechanism(Attention)is used to focus on suspicious malicious sequences.The CBLA model achieved an accuracy rate of 94.5% on the 7-classification problem of the Alibaba Cloud dataset,and achieved a 5% improvement.(3)Aiming at the problem that anti-detection methods such as packers interfere with static feature extraction and anti-virtual machine technology hides dynamic behavior,this thesis proposes a malicious code classification framework based on dynamic and static feature fusion.The API sequence is visualized and fused into RGB three-channel images combined with static features,and the Conv Ne Xt model is used for classification.The experimental results show that feature fusion can effectively make up for the shortcomings of static detection in dealing with code obfuscation,and can also reduce the impact of anti-virtual machine technology on dynamic detection.This thesis explores the use of deep learning to improve the classification performance of malware families in different detection scenarios,which improves the classification performance compared with related work.
Keywords/Search Tags:Deep learning, Malicious code classification, PE software visualization, API sequence, Feature fusion
PDF Full Text Request
Related items