Research Of Windows PE Malicious Code Detection Based On Multiple Features

Posted on:2022-08-14

Degree:Master

Type:Thesis

Country:China

Candidate:L P Jia

Full Text:PDF

GTID:2518306554453814

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Malicious code detection is one of the effective means to ensure information security.How to get better detection effect is the research focus in the field of malicious code detection.At present,most of the malicious code in the network attacks the Windows system.In order to solve this problem,the thesis proposes a classification method of malicious code based on multiple features fusion,which uses multiple features of malicious code and the idea of ensemble learning to realize the classification of families of malicious code.The main research work of this thesis is as follows:1.The malicious code samples are filtered and labeled,and the static and dynamic features of malicious code are extracted.The collected malicious code samples are filtered,and the qualified samples are marked with family information.Static features are extracted by static analysis: gray texture features and histogram features of byte entropy;Cuckoo sandbox is deployed to run malicious code,and dynamic behavior reports are generated.Dynamic features are extracted: frequency features of API calls and frequency features of API category calls.2.Aiming at the problem that some decision trees in random forest have negative effects on voting process,a decision algorithm of optimal sub forest(DAOSF)is proposed.By optimizing the random forest algorithm,combining the classification consistency,feature correlation and feature importance,the high-quality decision tree is selected to obtain the optimal sub forest.The traditional random forest algorithm requires all decision trees to participate in the voting process,while the decision algorithm of optimal sub forest eliminates the interference factors of poor decision trees and improves the classification effect of malicious code.3.Combine multiple features of malicious code and use ensemble learning for classification and detection.The single feature and fusion feature are used as input data,and ensemble learning models such as Stacking combination model,Light GBM and decision algorithm model of optimal sub forest are used for classification test.The experimental results show that the average classification accuracy of Stacking combination model is 95.63% and macro-F1 is 92.88%;Light GBM can achieve the classification results close to that of the Stacking combination model,but the training speed is faster than Stacking combination model;Compared with the traditional random forest,the decision algorithm model of optimal sub forest can get better classification effect with less decision tree participation.According to the classification process of malicious code,a family classification system of malicious code is designed and implemented.

Keywords/Search Tags:

Malicious Code, Fusion Feature, Ensemble Learning, Optimal Sub Forest

PDF Full Text Request

Related items

1	Multi-feature Android Malicious Code Detection Based On Ensemble Learning
2	Research On Android Malicious Code Detection Based On Ensemble Learning
3	Research On Cloud Security Analysis Methods Aiming At Malicious Code Recognition
4	Research On The Classification Of Malware Based On Feature Fusion And Version Difference
5	Pathological Speech Recognition Based On Ensemble Learning And Fusion Features
6	Analysis And Recognition Technology Of Malicious Code Network Behavior
7	Research And Implementation Of Android Malicious Code Exploration Based On Runtime Feature
8	Research On Classification Of Malicious Code Based On Multiple Features
9	Research On Key Technologies Of Malicious Code And Emergency Response In Communication Networks
10	Windows Malicious Code Detection And Analysis Based On Behavior Characteristics