Font Size: a A A

Research On Malware Visualization For Detection And Classification Based On Deep Learning

Posted on:2021-05-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J RenFull Text:PDF
GTID:1368330614466092Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In order to analyze and determine properties of malware,researchers have explored a variety of techniques and methods for malware detection,e.g.the static code disassembly and dynamic code execution,but both of them have their limitations.The static analysis uses code disassembly to check the control flow of the program in order to find malicious patterns;dynamic analysis needs to run malware in a virtual environment and characterize malware properties by its behavior.However,the static method can provide more comprehensive information only if malware does not use obfuscation technology;the dynamic method can observe the malicious behavior only if virtual environments satisfy the trigger conditions.Therefore,more intelligent data analysis methods are urgently needed to overcome the limitations of existing analysis techniques,improve the efficiency of analysts,and help them to extract features quickly from tons of suspicious data for malware analysis,detection and classification.Malware visualization is one of the interdisciplinary sciences,which can make scientific inference through visual interfaces,discover new attributes of visualized objects,and greatly supplement the cognition of analysists.In recent years,many meaningful research results have emerged in this field.A more effective way among them is to present malware files in the form of graphics and images and use potential visual patterns to express their implicit characteristics and data differences.This method can solve two main tasks in the field of malware analysis: malware detection and malware classification,i.e.the suspicious files will be analyzed to determine whether there is malicious content.If they are positive,these files will be assigned to a certain malware family according to its behavior and attributes.However,in practical use,there are still many important problems that restrict the method.Therefore,this doctoral dissertation focuses on the following issues: the existing malware visualization methods cannot locate character information for interactive analysis,cannot avoid Zip Bomb attacks and prevent malware from changing the global image features to escape detection,cannot intuitively reflect information compression or encryption in malware files,and makes beneficial exploration on the augmentability of malware visualization methods.In order to address the above issues,this doctoral dissertation use theoretical knowledge related to artificial intelligence,deep learning and convolutional neural networks to develop a visualization system for malware analysis,detection and classification.The system can represent malware features by means of transfer learning,and make analysts comprehend the regulation of malicious behavior by means of visual analysis.It can improve the recognition accuracy of malicious files and highlight the immediacy and operability of malware characteristics.The main content of this doctoral dissertation is summarized as follows:1.In the term of byte analysis,two new visualization methods of malware analysis based on N-gram features of byte sequences are proposed.Method 1(SFCM)uses space filling curves to visualize byte sequences of malicious binary files,i.e.one-gram features,distinguishes the printable characters with different colors to solve the problem that the existing grayscale method cannot locate character information for interactive analysis,and can avoid the risk of the Zip Bomb Do S attack caused by large malware.Method 2(MDP)visualizes the bi-gram features and their statistical information of byte sequences as the coordinates and brightness of pixels,and solves the problem that malware relocates code sections or adds redundant data to change the global image features.Both of the two methods use the convolutional neural networks and SVM(support vector machine)classifier to learn and classify the image results.We applied the two methods to Microsoft's malware samples(BIG 2015| Kaggle)and obtained 98.36% and 99.08% classification accuracy respectively.In the same way,as to the benign Windows PE(portable executable)files and the above malware set,the detection rates of the two methods achieved 99.21% and 98.74% respectively.In addition,our methods also improved the existing grayscale method about its classification accuracy and detection accuracy.2.In the term of the information reflected in bytes,a novel malware visualization analysis method based on local entropy is proposed.This method divides a.byte file into blocks and calculates the entropy values of these blocks,next normalizes the length of the local entropy sequence and marks the entropy values with different colors,then enhances the visual performance by extending the range of the entropy values,and finally uses space filling curves to implement visualization.This method solves the problem that the traditional malware analysis methods cannot directly reflect compressed or encrypted malicious files.We used this method to visualize the above sample set,and obtained the optimum classification accuracy of 99.10% by using the deep fusion networks.Similarly,the detection rate of this method achieved 99.48%.Moreover,this method also improved the existing entropy histogram method by increasing the classification accuracy from 65.32% to 98.93% and the detection accuracy from 84.53% to 99.43%.3.In the term of the semantics expression to bytes,an innovative visualization method for malware analysis based on opcode frequency is proposed.In order to extract the opcodes in machine instructions sequence,this method requires static analysis to disassemble malware files.This method differentiates the most common and rare opcodes with different colors,then rearranges the opcodes by the sequence of the corresponding color value in the RGB space to accomplish the mapping of the opcode frequency.The method solves the problem that the existing image matrice method has poor visual effect and low classification performance.We verified this method with the ASM file set(a disassembled version of the same malware samples)provided by Microsoft,and obtained 98.50% classification accuracy.In this doctoral dissertation,we evaluated the effectiveness of the presented methods by experimental results as following: 1.With respect to visual representation,all the proposed methods can make the images of the same malware families similar,and distinguish those of different malware families clearly.2.With respect to malware classification,all the proposed methods,inspired by transfer learning,take full advantage of the convolutional neural network in the field of image classification,and acquired better results than the existing malware visualization methods;3.With respect to analysis efficiency,all the proposed method construct visual communication between researchers and malware,and therefore decrease the requirements for the professional standards and analysis experience of analysts.All the images generated by the presented methods are normalized,and these methods can implement classification and detection in the automatic mode,so the work efficiency is dramatically improved.It is important to note that the first three methods are highly adaptable to obfuscated malicious samples without malware disassembly or code execution.At the end of this doctoral dissertation,the conclusions are made and suggestsed directions for future work are discussed.
Keywords/Search Tags:Malware, Visualization Analysis, Convolution Neural Networks, Deep Learning, Space Filling Curves
PDF Full Text Request
Related items