Font Size: a A A

Research On Classification Of Malicious Code Based On Multiple Features

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:S J LiFull Text:PDF
GTID:2428330629951029Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As information technology enters a new era,especially the development of 5G technology,information transmission has entered a stage of high-speed sharing.The emergence of malicious code in various unknown forms and unknown fields makes the detection of malicious code face great challenges.Traditional malicious code feature matching detection methods are incomprehensible to the rich forms of malicious code in today's network environment.More and more new computer technologies are applied to malicious code detection,and good detection results have been achieved against malicious code and its variants.Applying machine learning algorithms and deep learning algorithms to malicious code detection is a popular research content nowadays.To solve the problems of low classification efficiency,single feature extraction and poor accuracy of traditional malicious code.This paper proposes two classification methods.The first is to manually extract multiple features and is not limited to text features combined with random forest algorithms for malicious code classification.This article chooses to abstract the assembly opcode into a grayscale image.The characteristics of image visualization can effectively discover new features.In this paper,the malware source files are decompiled by IDA to generate.bytes files and.asm files.The.asm files can extract features from two perspectives.First,text features can be extracted through the N-Gram algorithm.Second,.asm files can be image converted into grayscale images.Grayscale images can be extracted in two aspects: color features and texture features.Finally,the random forest algorithm is used for classification.However,it takes a lot of time to manually extract the features to adjust the parameters of the random forest,so that the random forest constructed by different features can achieve a good classification effect.In order to improve the efficiency of feature extraction,the algorithm is used to automate the feature extraction.Based on this,a second classification method is proposed.The classification of malicious code is studied by using a deep learning algorithm convolutional neural network,and the malicious code is converted into a grayscale image by using a B2 M algorithm.Automatic training and feature mining,and finally classification of malicious code through the output of the network layer.The first classification idea with multiple feature fusion combined with random forest algorithm achieves an accuracy rate of 96.78% for classification of malicious code.The second idea uses convolutional neural networks to classify malicious code with an accuracy rate of more than 90%,although it has not exceeded for the time being.The classification accuracy of artificially extracted features,but also fully demonstrated the potential of deep learning algorithms.Finally,the two methods were compared,analyzed and summarized.
Keywords/Search Tags:gray scale image, N-Gram algorithm, random forest, fusion feature, Convolutional neural network
PDF Full Text Request
Related items