Font Size: a A A

Malware Detection And Classification Based On Memory Data Features

Posted on:2024-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:G QinFull Text:PDF
GTID:2568307064950699Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the advent of the digital era,the software industry has become the soul of information technology,the shield of network security,the engine of economic transformation,and the foundation of the digital society.The development of the software industry has brought not only economic growth and convenience in daily life,but also potential risks.Currently,the challenge faced by malware detection is becoming increasingly difficult,and the security environment of cyberspace is facing unprecedented threats.Therefore,it is urgently needed to study effective methods for identifying and classifying malicious software.Conventional techniques for identifying malware have become mature,but with the development of anti-detection techniques,malware can hide itself through techniques such as packing,morphing,polymorphism,and self-destruction.Currently,there are few specialized methods for detecting hidden malware.Therefore,this article proposes a fast and efficient fuzzy malware detection method based on data features to enhance the security of cyberspace,avoid economic losses,and strengthen the capacity for counteraction.The dataset used in this article was extracted by the memory extractor Vol Mem Lyzer and contains 55 features.This dataset not only reflects some basic features of malicious software,but also extracts memory data features of obfuscated malware,which facilitates the detection of hidden malware.Among them,the newly added 26 features are specifically targeted at obfuscation and hidden malicious software.These features can be roughly divided into five categories,all related to the characteristics of hidden malicious software.This article uses light GBM to build a model of the original data,check the feature importance,sort the feature importance,and screen out the top 5 variables with the highest feature importance values.Then,support vector machine modeling is performed.The resulting model performs well on both the training set and the test set,with a test accuracy of 0.995.Using this model to detect malware can not only detect ordinary malware,but also discover hidden malware.Then,select all malicious data and build a model using a dataset containing three types of malware classifications.Random forest is used to build a model for malicious data,check the feature importance,sort the feature importance,screen out some variables,and then use XGBoost to build a model with the target variable.The resulting model performs well on both the training set and the test set,with each class data of the test set reaching 0.896 for Ransomware,0.932 for Spyware,and 0.901 for Trojan.The XGBoost model established has high accuracy and recall rates,and the predictive results of various data types are relatively accurate.The performance of the model is very good.The fuzzy malware detection method based on data features proposed in this article can effectively deal with the concealment and deformation of malware and is of great significance for the security protection of cyberspace.This method can not only quickly identify ordinary malicious software,but also accurately discover malicious software hidden in memory.In addition,this method has high accuracy and recall rates and can be used for the classification detection of three common types of malware.This method may play a certain role in future network security protection work.
Keywords/Search Tags:Machine Learning, Malware Detection, XGBoost, Support Vector Machine
PDF Full Text Request
Related items