Font Size: a A A

Research On Malicious Code Detection Technology Based On Deep Learning

Posted on:2021-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:2428330614972100Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the increasing popularity of the Internet in national life,many aspects of people's lives have been intertwined with the Internet.The dynamic analysis of malicious code is susceptible to the constraints of the execution environment,unable to obtain the full path behavior characteristics,and has a large overhead and low efficiency.The popular malicious code detection method based on machine learning in recent years cannot extract features automatically and effectively,and relies on manual feature extraction.These shallow features cannot accurately describe the malicious code,resulting in problems such as low detection accuracy.In response to the above problems,this article uses deep learning ideas and techniques to start with static analysis to detect malicious code.The main work of this article is as follows:1.Propose a vectorized representation model of malicious code instruction layer and semantic layer based on Glove.Starting with static analysis,this article first checks and unpacks malicious code samples to ensure that they are not affected by code packing.Traditional feature extraction methods usually use the N-Gram algorithm,which lacks the relevance consideration of the malicious code context behavior information.To solve this problem,this paper designs and implements a feature vectorized representation method based on Glove algorithm.First batch disassemble to obtain assembly files,and extract two characteristics that can well represent malicious code behavior information.At the instruction level,the algorithm based on regular expression matching is used to extract the opcode sequence in the assembly code;at the semantic level,the depth-first traversal algorithm is used to extract the key API sequence from the assembly file code,and the Glove model is constructed to obtain the vectorized features Representing the model,the spatial correlation between word vectors is used to express the correlation of sequences,and comparison experiments are performed to verify the effect of the word vector method.2.Design and implement a neural network model based on Convolution Neural Network(CNN)and Recurrent Neural Network(RNN),named MCC?RCNN(Malicious Code Detection?Recurrent Convolutional Neural Network).Machine learning classification models are usually simple,and feature extraction methods usually stay at the surface,resulting in a low accuracy of malicious code detection.In recent years,the use of deep learning to detect malicious code has become a research hotspot.However,when applying RNN alone,such as LSTM(Long Short-Term Memory)detection,the LSTM model cannot extract sequence information that is too long;when applying CNN detection alone,after CNN training,features do not have contextual relevance.In response to the above problems,this paper combines CNN and RNN to propose a detection model,named MCC?RCNN.MCC?RCNN fuse LSTM and Gated CNN,first input malicious code feature sequence into LSTM,use LSTM's preservation mechanism,forgetting mechanism and long-term memory information to obtain long sequence of operation behavior information,then input it into Gated CNN to extract Local features of different dimensions.The data set is a malicious code public data set initiated by Microsoft on the Kaggle platform.The comparison with the convolutional neural network,recurrent neural network and machine learning classification model proves the effectiveness of the MCC?RCNN malicious code detection model proposed in this paper.2.Propose a feature description method for feature fusion at the static behavior level of malicious code.The results of malicious code detection and classification are largely procedurally dependent on feature description methods.The static features of different levels of malicious code describe malicious code from different dimensions.In order to give full play to the advantages of static analysis and reduce the impact of static analysis code obfuscation,this article fuses the instruction layer and semantic layer features of malicious code in order to improve the description of features Ability to make malicious code described more accurately.The two feature vectors are horizontally stitched in the fully connected layer,and then detected by the MCC?RCNN model.Through comparison experiments and comparison with international leading papers,the effect of the fusion feature detection method is verified.
Keywords/Search Tags:Deep learning, Malicious code, Instruction layer, Semantic layer, Detection
PDF Full Text Request
Related items