Font Size: a A A

Research And Implementation Of Malware Variant Detection Technology Based On Self-attention Mechanism

Posted on:2022-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:W S LiFull Text:PDF
GTID:2518306332467424Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the popularization of polymorphic technology and the emergence of automatic obfuscation tools,the number and complexity of malicious code variants increase sharply,which brings serious threats to network security.Attackers can change the original characteristics of malicious code by instruction reordering,replacement,obfuscation,and encryption to avoid being killed by anti-virus software,which provokes challenges to the analysis and research of malware.Existing methods can be divided into traditional methods and machine learning methods.Most of the traditional malicious code variant detection is based on rule matching or sandbox running results,which cannot cope with the gradual increase of reverse debugging techniques.The machine learning based detection methods mostly rely on manually extracted features,which requires researchers to have a higher level of prior knowledge.Thus,it is easy to ignore the key information in the malicious code variants.Although attackers use variants to circumvent various detection engines and produce malware on a large scale,they change only a tiny amount of code.Therefore,as long as researchers grasp the characteristics of the source code or the commonality between variants,they can effectively detect most of the malicious code variants and defend against a large number of malicious codes.In recent years,researchers have begun to try to apply deep learning technology to the field of malicious code variant detection.How to make full use of massive raw data of malicious code,mining the internal correlation between malicious code belong to the same family,as well as realizing end-to-end malicious code variant detection effectively and quickly is of great importance.This paper proposes a malicious code variant detection scheme,the main research achievements include following aspects:(1)Adversarial techniques such as shell and obfuscation bring much interference to the method of malicious code variation detection method based on binary images.To solve this problem,this paper proposes a color image representation method of malicious code based on the relative virtual address.Compared with the existing methods,this method introduces the assembly feature and the developer information feature on the basis of the binary information.The method takes the relative virtual address as the index,which solves the problem that the multi-dimensional feature cannot be correlated.The experimental results show that images converted from malicious code by the visualization method proposed in this paper are clearer and have richer texture features,which lays a solid foundation for the improvement of detection accuracy.(2)In view of the fact that the existing malicious code variant detection models cannot cope with the challenge brought by the increasingly diversified countermeasures of malicious code,this paper proposes a convolutional neural network detection model which introduces the self-attention mechanism.By expanding the local receptive field of the convolution kernel to the global receptive field,this model enables the variation detection model to explore the context logic relationship and call relationship within the code.In this way,the detection system can compare the similarity between codes in a more comprehensive way,which effectively compensates the deficiency of the existing methods in the model perception area.Experimental results show that this method has obvious advantages when malicious code variants have considerable irrelevant codes or code segments are partially changed.(3)Given the problem that the existing methods cannot deal with the significant difference in file size caused by the process of malicious code variation,the idea of spatial pyramid pooling is introduced into the detection model.The spatial pyramid pooling layer can convert different input sizes into the same output dimension,which solves the problem of information loss caused by sampling,interpolation,and other processing methods during malicious code image processing,effectively improving the detection accuracy and detection efficiency.(4)A malicious code variant detection system is developed.To verify the effectiveness of the method proposed in this paper,we designed and implemented the detection system.We also described the overall design of the system and each functional module in detail.Finally,through several comparative experiments,the effectiveness of the visualization method proposed in this paper is verified.The color image and deep learning algorithm generated by the method in this paper can make the model accuracy reach 98.33%.Compared with other methods,the recall rate and precision rate are also significantly improved.
Keywords/Search Tags:deep learning, malicious code visualization, self-attention mechanism, space pyramid pooling
PDF Full Text Request
Related items