Font Size: a A A

Research Of Windows PE Malicious Code Detection Based On Multiple Features

Posted on:2022-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:L P JiaFull Text:PDF
GTID:2518306554453814Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Malicious code detection is one of the effective means to ensure information security.How to get better detection effect is the research focus in the field of malicious code detection.At present,most of the malicious code in the network attacks the Windows system.In order to solve this problem,the thesis proposes a classification method of malicious code based on multiple features fusion,which uses multiple features of malicious code and the idea of ensemble learning to realize the classification of families of malicious code.The main research work of this thesis is as follows:1.The malicious code samples are filtered and labeled,and the static and dynamic features of malicious code are extracted.The collected malicious code samples are filtered,and the qualified samples are marked with family information.Static features are extracted by static analysis: gray texture features and histogram features of byte entropy;Cuckoo sandbox is deployed to run malicious code,and dynamic behavior reports are generated.Dynamic features are extracted: frequency features of API calls and frequency features of API category calls.2.Aiming at the problem that some decision trees in random forest have negative effects on voting process,a decision algorithm of optimal sub forest(DAOSF)is proposed.By optimizing the random forest algorithm,combining the classification consistency,feature correlation and feature importance,the high-quality decision tree is selected to obtain the optimal sub forest.The traditional random forest algorithm requires all decision trees to participate in the voting process,while the decision algorithm of optimal sub forest eliminates the interference factors of poor decision trees and improves the classification effect of malicious code.3.Combine multiple features of malicious code and use ensemble learning for classification and detection.The single feature and fusion feature are used as input data,and ensemble learning models such as Stacking combination model,Light GBM and decision algorithm model of optimal sub forest are used for classification test.The experimental results show that the average classification accuracy of Stacking combination model is 95.63% and macro-F1 is 92.88%;Light GBM can achieve the classification results close to that of the Stacking combination model,but the training speed is faster than Stacking combination model;Compared with the traditional random forest,the decision algorithm model of optimal sub forest can get better classification effect with less decision tree participation.According to the classification process of malicious code,a family classification system of malicious code is designed and implemented.
Keywords/Search Tags:Malicious Code, Fusion Feature, Ensemble Learning, Optimal Sub Forest
PDF Full Text Request
Related items