Font Size: a A A

Malware Classification Based On Texture Feature Fusion And Deep Learning

Posted on:2022-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YangFull Text:PDF
GTID:2518306494489204Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
With the rapid development of big data,the Internet of Things,and5 G communication technologies,security issues of cyberspace have become increasingly severe.Relying on technologies such as code reusing,attacker can quickly generate new malware,which continue to threaten existing detection technologies.The growing number of malware not only threatens people's property,equipment,and privacy,but also poses a huge threat to the national cyberspace.Inventing more accurate methods to identify malware is of great significance to maintaining the security of national cyberspace.Malware is widespread and hidden in PCs and mobile devices,and Io T devices with more vulnerabilities have gradually become new targets for malware.Different malware has its special purpose and destructiveness,and malware of the same family often use code reusing technologies to generate new malware.Therefore,the code and data and behaviour between the same family should also have a high degree of similarity.With the development of GPU parallel computing and deep learning algorithm,deep learning has become the mainstream of artificial intelligence and the solution to hot issues.Deep learning has also been widely used in malware classification and has achieved excellent results.This research uses deep learning technology to classify malwar e images.The main contributions are as follows:(1)Using appropriate methods to label the data set.The original EXE data are not labeled.It cannot be trained and tested based on supervised learning.How to properly label them to appropriate families is particularly important for subsequent experiments.Virustotal and other security engines are tools commonly used to label malware.This research uses the Virustotal Json Report and the Avclass malware marking tool to label malware.In short,it is to classify malware as the most likely family in the Report to Simplify malware marking steps and improve marking efficiency..(2)The original malware is first preprocessed to get its byte stream,and converted byte stream into grayscale image using the Bin2 pixel algorithm.Then collect the Bigram sequence of its byte stream,and convert the Bigram sequence into grayscale image.Finally,we use scripts and IDA?Pro tools to convert malware into decompiled files,and use Bin2 pixel to convert the Lst file into grayscale images which contain more information than Asm grayscale images.So three different types of data sets can be obtained: malware images,Bigram images,and Lst images.(3)Aiming at the traditional classification of single-channel malware image has low accuracy and weak anti-aliasing ability,this research has made improvements to the malware image by combining three kinds of gray-scale images including malware,Bigram,and Lst into three-channel color images for classification.Three-channel color image contains more information than single-channel image.The results show that the threechannel image has a higher accuracy.(4)Efficient Net is a new image classification model proposed by Google in 2019.It refreshed the accuracy record of Image Net classification and proved its ability to recognize image texture.Therefore,this research transforms Efficient Net to make it applicable to the malware data set.Experiments showed that Efficient Net can obtain higher accuracy than other networks when using Image Net weights for fine-tuning.(5)Aiming at the problem that pre-training takes a long time to make the model converge,and requires a wealth of tuning experience and massive data,this paper proposes deep learning combined with the fine-tuning method in transfer learning to train the model.This method can use the "knowledge" of the existing field to learn with little overhead,and it can make the model converge in a short time with good performance.This research found fine-tuning can make the model converge faster than pretraining and fine-tuning can maximize the accuracy in a smaller number of epochs,which greatly saves the cost of tuning and training time and data collection.This research uses two fine-tuning parameters,Image Net and Noisy Student(only provided by Efficient Net).The results show that accuracy can be significantly improved by fine-tuning,and fine-tuning can maximize the accuracy on malware data set.
Keywords/Search Tags:malware, marking, Bin2pixel, EfficientNet, ImageNet, Noisy Student, fine-tuning, Pre-training
PDF Full Text Request
Related items