| With the development of computer technology and the popularization of the Internet,the uncontrollable factors of network security have increased significantly,in which malwares have become one of the major threats to the Internet security.With the increasing number of malwares and the development of anti-reconnaissance technology,malware threatens not only the safety of personal computers,but also the security of corporate,industrial and national information,bringing enormous economic losses.Most of malwares carries anti-reconnaissance mechanisms,such as polymorphic conversions,making a wide range of malware variants.The traditional approach of malware detection is extracting static opcode sequences as features,and using the data mining methods to detect unknown malwares for category.However,in the case of a small training set,the accuracy of the method may be severely limited.With the increasing number of malwares,the number of existing training set samples is often smaller than the number of malwares to be tested.Therefore,in the case of small training set,how to improve the accuracy of detection has become the research focus of this thesis.Based on the analysis,this thesis researches a detection method based on the static operation code sequence for malware variants.The method transforms the executable profiles to opcode sequences,and then to opcode sequence images.It uses optimized CaffeNet model to detect the images of operation code sequences.The main contributions are as follows:In this thesis,the problem of malware detection is converted to image detection.First,opcode sequences are extracted by shelling and disassembling.Second,opcode sequences are converted to images where the image matrixes consists of all the opcodes and the pixels of image matrixes are represented by the product of the possibility of opcode sequences and information gain.Last,the comparison of malware and benign software is increased by using histogram normalization,the gray scale method and the mathematical morphology.This thesis uses threes kinds of category methods,such as the principal component analysis combing with the K nearest neighbor,the principal component analysis combing with the support vector machine,and the optimized CaffeNet model.The training time is reduced by decreasing the number of layer of CaffeNet model with the good accuracy of detection.In the experiments,the method in this thesis is compared with traditional methods.The accuracy and processing time are used as matrices to analyze the experimental results.The method in this thesis can improve the accuracy efficiently based on the comparison and analysis,in the case of small training sets.The optimized CaffeNet model can provide 12.7% higher accuracy than traditional data mining detection method based on opcode Sequence Features. |