Research On Obfuscated Malware Detection Method Based On Classical Machine Learning Method

Posted on:2024-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Wang

Full Text:PDF

GTID:2568307064985389

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Classical machine learning methods are based on mathematical statistics.Compared with deep learning methods,they also have the characteristics of low resource consumption during training.The classical machine learning methods are widely used in many fields.At a time when obfuscation techniques have been widely used in various malicious applications,machine learning methods have great significance.Malicious programs are programs with attack intentions.After decades of development of malicious programs,traditional static detection methods are not to be so effective.There are two main reasons: First,the rapid development of the Internet makes it possible for everyone to write their own programs,the attackers can quickly generate new malicious programs that match their attack intent based on existing malicious programs.So,a malicious program may have dozens of variants.Second,the widespread usage of obfuscation techniques.Machine learning and deep learning methods are currently the most commonly used methods for detecting obfuscation or detecting malicious programs.However,deep learning methods that are better for detecting obfuscation features require a lot of memory space consumption during training,and also lack relatively novel dynamic feature-based methods.To solve these problems,this paper proposes a two-step detection scheme for obfuscated malicious programs: this article use the naive Bayesian method to detect obfuscation,and judge whether it is a malicious program based on the API sequence called by the obfuscated program when it is dynamically running.The main contributions of this paper are as follows:1.This paper propose a obfuscated malicious program detection and classification method: First,this paper’s method detect the obfuscated program according detecting obfuscated strings,then use integration machine learning method to detect the obfuscated program example to judge whether the example is malware.The existing naive Bayesian method for detecting obfuscated strings is improved by this article and introduced into malicious program detection.The improved method uses the hash word bag method,which reduces the memory occupied about 95% during training.2.At the malicious program detection part,this paper uses a relatively novel feature: the order of calling the system API,and introduces the random forest composed of decision trees and the regression stacking method of Naive Bayesian for the first time.Compared with the single decision tree before random forest,the absolute accuracy rate has increased by about 3%,and the accuracy rate has increased by about2% compared to the stacking method.The accuracy of multi classification of malware identification reached 88.3%,which reached the accuracy level of static identification.This means that the obfuscated malicious program identification method proposed in this paper solves the problem of high overhead in obfuscated identification space,and also improves the accuracy of malicious program identification,providing a new idea for obfuscated malicious program identification.

Keywords/Search Tags:

Machine Learning, Malicious Program Identification, Ensemble Methods, Decision Trees, Random Forests, Naive Bayes

PDF Full Text Request

Related items

1	Study On Android Malware Detection Method Based On Machine Learning
2	The Study Of Ensemble Learning On Naive Bayes Classifier
3	Research On Malicious User Identification Of Weibo Based On Machine Learning Classification Algorithms
4	Efficient Tumor Traceability Prediction Based On Hybrid Machine Learning
5	Prediction Of Loan Default Risk Based On Ensemble Learning
6	Optimizing Voting Process Of Random Forests Algorithm Based On Weighted Decision Trees
7	Malicious Code Identification System Based On Behavior Analysis
8	Image Annotation Based On Ensemble Of Naive Bayes Classifier
9	Prediction Of Protein Contact Map Based On Weighted Naive Bayes Classifier And Extreme Random Tree
10	Strategies Based On GBDT,RAF And Ensemble Models:Application On Stocks From CSI 300 Index