Font Size: a A A

Research On Malicious Code Detection Based On Feature Fusion And Machine Learning

Posted on:2023-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y X YangFull Text:PDF
GTID:2568306794453384Subject:Computer Science and Technology
Abstract/Summary:
In recent years,the types of malicious code are increasing,the destructive ability is becoming stronger and stronger,and the direct or indirect economic losses are becoming more and more serious.In order to improve the accuracy of malicious code classification,more and more scholars study malicious code.Due to the improvement of the computing power of computer hardware,artificial intelligence has ushered in the golden age of rapid development again.Deep learning has made remarkable achievements in the fields of picture recognition,driverless,speech recognition and so on.Some scholars have proposed a new malicious code preprocessing method,which uses image processing technology to visualize malicious code,so that the malicious code detection method based on image features has been further developed.However,some scholars first convert the binary executable files of malicious code into gray images during data preprocessing,and then use neural network to automatically learn the features of these gray images to realize classification.This method has two defects.The first is that the information of malicious code may be lost during malicious code preprocessing.Due to the different functions and destructive power of different malicious codes,the size of binary executable files compiled by them is different,and the binary executable files of malicious codes are directly transformed into two-dimensional gray images,which are also of different sizes and do not meet the input conditions of neural network.The large gray image must be truncated to make all samples have the same size,and the discarded information may be the key information of the malicious code,which will cause problems such as low accuracy or insufficient generalization ability of the model.The second is that the extracted features are single and the anti confusion ability is insufficient.Because the malicious code has been converted into gray image,its text features have been destroyed,so the text features of malicious code can not be extracted by convolutional neural network.This thesis adopts the method of fusion of N-grams and grayscale features(GLCM,LBP)to detect malicious code,solves the problem of different sample sizes of malicious code,and extracts malicious code from two different dimensions of text and grayscale images.The features of,improve the anti-obfuscation ability of malicious code detection,and then use k nearest neighbors,random forests,naive Bayes and support vector machines in the three single features of LBP,GLCM and n-gram features and the above three single features After the fusion,the classification is carried out,and the conclusion is that the accuracy of the fusion feature is higher than that of the single feature,and the accuracy of the random forest in the fused features is 98.71%.In order to compare the performance of machine learning algorithms and convolutional neural networks in the fusion of N-grams and grayscale features(GLCM,LBP),four neural networks,alexnet8,inception10,VGG16 and resnet18,were used to train and classify the fused features.,it is found that after 18 rounds of training,resnet18 has a loss rate of 0.0966 and an accuracy rate of 97.88% in the test set.The loss rate and accuracy rate distribution are the lowest and highest among the four neural networks.It is found that the accuracy of machine learning algorithms on the fusion of N-grams and grayscale features(GLCM,LBP)is better than that of deep learning.
Keywords/Search Tags:malicious code, feature fusion, machine learning, deep learning
Related items