Font Size: a A A

Research On Data Augmentation Of Malicious Code Based On Generative Adversarial Networks

Posted on:2022-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZhuFull Text:PDF
GTID:2518306491973039Subject:Architecture and Civil Engineering
Abstract/Summary:PDF Full Text Request
In order to meet the higher requirements of emerging technologies such as machine learning and deep learning on the quantity and quality of malicious code data sets in the actual application process,and to make up for the time-consuming,labor-intensive,cost-intensive and limited effect of expanding the data set through data re-collection or traditional enhancement methods.This dissertation is based on the analysis of the visual representation method of malicious code,and explores the method of malicious code data augmentation based on malicious code image and generative adversarial networks,mainly conduct research from malicious code data representation,model design,and experimental verification.The results of the research are as follows:(1)In order to construct a malicious code image data set that can be effectively used for follow-up research,this dissertation proposes a visual representation method of malicious code that can retain all hidden features of malicious code with high probability and reduce losses.This method uses the method of setting the same ratio of length and width to replace the previous method of setting a fixed width,which effectively reduces the loss caused by the problem of different image sizes after conversion due to the different sizes of malicious code executable files,and constructs a better quality malicious code image data set.(2)In order to realize the augmentation of malicious code data,this dissertation proposes a malicious code data augmentation model--DVGAN-GP based on the fusion of generative adversarial networks and variational autoencoder.The proposed model is mainly composed of three progressive research parts: a)Constructed and implemented enhanced models based on improved GAN models--DCGAN and WGAN-GP;b)In view of the poor training stability of the generative adversarial network model and the randomness of the generated data,variational autoencoder was introduced,and an improved deep convolutional variational autoencoder(DCVAE)model was constructed;c)On the basis of in-depth exploration of the generative adversarial networks and the variational autoencoder model,aiming at their respective strengths and weaknesses,it is proposed to integrate the advantages of the variational autoencoders that can learn the smooth latent state representation of the input data into the generative adversarial networks,and build a new enhanced model DVGAN-GP.The experimental results show that the proposed model can generate new malicious code data that meets generation expectations,contains more information,and has better quality.(3)In view of the shortcomings of existing studies that only use classification accuracy as an evaluation index to verify the validity of the generated data,this dissertation introduces two image quality evaluation indicators--peak signal-to-noise ratio(PSNR)and structural similarity(SSIM)to comprehensively evaluate the similarity and effectiveness of malicious code generated data.By quantitatively comparing the similarity between the generated data and the original data and constructing a variety of classification models,constructing data sets from multiple perspectives for classification experiments,it is finally verified that the proposed DVGAN-GP model can give full play to the advantages of VAE and GAN,and has the best overall performance,the data generated by it can contain rich feature information,assist in improving the performance of the classification model,and have enhanced effectiveness.
Keywords/Search Tags:malicious code data, generative adversarial networks, variational autoencoder, deep learning, data augmentation
PDF Full Text Request
Related items