Font Size: a A A

Research On Malware Classification Based On Deep Learning And Multi-Feature Fusion

Posted on:2022-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiuFull Text:PDF
GTID:2518306749971859Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
According to statistics,since 2020,261603 malicious programs have been captured in China,and the consequences of malware attacks are very serious.For a large number of malware,the classification of malware is particularly important.Higher accuracy classification methods can help us better deal with malware attacks.With the continuous evolution and increasing types of malware,the traditional static classification methods and dynamic classification methods can not deal with emerging malware.Therefore,this thesis proposes a new classification model by combining multi feature fusion and deep learning.The experimental data show that the classification effect is more accurate than the traditional method.The main contents and work of this thesis are described as follows:? Decompile the malware executable file to generate.Bytes file and.ASM file,and extract the n-gram instruction features in the assembly file through the feature extraction algorithm.B2 M algorithm is used to convert binary files into gray images,and the texture features in gray images are extracted.A feature fusion algorithm is designed to realize the fusion of the two features.? Based on the long-term and short-term memory network model of LSTM,a classification model based on bilstm bidirectional long-term and short-term memory network is proposed in this thesis.The optimal parameters of bilstm model are obtained by using the control variable method.The n-gram feature,texture feature and fusion feature are respectively input into the bilstm model.It is found that the average classification accuracy obtained by using the fusion feature as the input feature is 96.8%,which is 0.7 percentage points higher than 96.1% obtained by a single feature.At the same time,through comparative experiments,the classification effect of bilstm model on malware family is higher than that of traditional models such as random forest,SVM and KNN.? According to the structural characteristics of CNN model,a network model based on CNN bilstm is designed,and the model is optimized by adjusting parameters.Through comparative experimental analysis,the classification accuracy of CNN bilstm model for malware family is 97.39%,which is 0.59% higher than that of bilstm model.Based on the original model,this paper designs and introduces an oversampling method and a loss function,which further improves the classification effect of the model by0.16 percentage points,reaching 97.55%.It is proved that the model has good performance for malware family classification.
Keywords/Search Tags:Malware code, N-Gram, Grayscale texture, cnn, bilstm
PDF Full Text Request
Related items