Font Size: a A A

The Classification Of Malware Based On Multi-feature

Posted on:2021-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:B C JinFull Text:PDF
GTID:2518306113951599Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present,the application of the Internet is related to social development and people's lives,when people are enjoying the efficiency by the Internet,they also need to prevent criminals from attacking the user terminal through the Internet,avoid illegal collection of information without the user's instructions.Therefore,the design of an efficient malware detection and identification technology is very importance,in which the classification of malware family is an important step in the detection of malware,and it is of great significance in the fight against the diversity and polymorphism of malware.There are two kinds of methods about malware classification,which are classified based on static feature and classified based on dynamic feature.The method based on static features uses the static features of samples such as string signature,byte sequence,system call sequence and gray scale to classify.In this method,the feature extraction stage is fast,but the accuracy is limited by file packing and code obfuscation techniques;The classification based on dynamic features requires malware samples run in the virtual environment,and through the corresponding monitoring tools to capture dynamic behavior,eventually the captured dynamic behavior will be quantified sample features and used for classification,the method's advantage is able to capture the real operation of malware,but due to the time needed for the technology cost is bigger,so it is not suitable for classifying large sample set,and analysis effect of this method is subject to the virtual machine and the sandbox.For malware family classification problems,this paper proposes a malware family classification framework based on the multi-feature,which contains static feature and dynamic feature,specifically including the following three types,static signature,gray image and path tree of behavior.The malware classification framework first grabs and crawls the public virus library,obtains the signature of 2.89 million samples,and fuses them into the Clam AV virus library.The malware sample set was then detected using the expanded Clam AV library;Secondly,the undetected samples will be sorted and screened according to certain rules,obtaining the malware sample set used to extract the static features of the sample(grayscale),then these samples will be transformed into a gray scale image of fixed size.The gray value of each pixel in the gray scale image is used as the static feature of the sample for the classification;Thirdly,select the samples that failed in the previous stage of classification,capture the dynamic behavior of these samples in the virtual environment,extract the behavior path sequence of malware samples as the dynamic feature of the samples,and generate the dependency relationship among attributes by transforming the path sequence of samples into a tree structure.Compared with the traditional system call-based malware classification method,the method based on behavior path tree has lower structure complexity and time complexity;Finally,this dynamic feature is used to classify samples to complete the depth optimization of the behavior path tree,and the final classification accuracy of the model is obtained by combining the classification results of malicious samples in the first two stages.The method has been implemented and tested on a set of 46,893 malware instances in 77 families,and the classification accuracy achieves 82.06%.
Keywords/Search Tags:Multi-feature, Static Signature, Gray Image, Path Tree of Behavior, Classification of Malware
PDF Full Text Request
Related items