| Windows operating system with a large number of users has always been the main target of malware attack.Traditional detection methods still have the challenge of high computing resources in the aspect of feature analysis,and it is difficult to cope with the rapid variation of malicious software due to shell and obfuscation techniques.The method of directly visualizing software into images and then using deep learning technology for classification has the advantage of not requiring complex feature collection,however,visualizing the static features of software directly is difficult to reflect the behavioral characteristics of malware.At the same time,in reality,the number of some types of malicious samples is too small,and the problem of sample imbalance leads to the low detection accuracy of small sample category.To solve the above problems,this paper studies the visualization method and data enhancement method of malware dynamic characteristics,and uses deep learning technology to detect malware.The main research contents are as follows:1.For the malware feature analysis part,the sequence of dynamic API calls can reflect a software-specific behavior sequence,in order to preserve the calling sequence of Windows API call sequence,this paper proposes for the first time a method combining dynamic analysis and visualization of malware based on the idea of Gramian Angular Field(GAF),using the Windows API call sequence obtained by dynamic analysis to generate GAF feature images,and the convolutional neural network(CNN)structure is designed to detect malware from the perspective of image texture.Experimental results show the effectiveness of the proposed method.2.Aiming at the problems of unbalanced data sets and low recognition rate of models for categories with small sample size in data sets,this paper proposes a malware data enhancement method based on malware visualization and Generative Adversarial Network(GAN).First,a new malware RGB image is generated based on the dynamic API call order and combined with the number of API calls.Secondly,a malware data enhancement model based on WGAN-GP is designed for the category with small malware sample size.Finally,experiments were conducted from the perspective of quality assessment of generated data and model accuracy.On the one hand,PSNR and SSIM image quality assessment methods were used to evaluate the similarity between generated data and original data,so as to verify the effectiveness of the proposed data enhancement model.On the other hand,data sets of different sizes are constructed to verify the impact of data enhancement using generated data on the accuracy of model detection.Experimental results show that the data enhancement method proposed in this paper can generate generated data containing rich feature information,effectively alleviate data imbalance,and improve model performance. |