Font Size: a A A

Malware Classification Method Based On Feature Fusion And Machine Learning

Posted on:2021-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:H X LongFull Text:PDF
GTID:2518306467969809Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the information-based Internet,software has a greater and greater impact on people's lives and work.Due to the interests driven by the emergence of more and more malicious software,it has increasingly hidden threats to normal software business,systems and networks.Security issues are becoming increasingly important,and the detection and classification of malware is increasingly challenged.The continuous development of malware technology has made the cost of sample research analysis and detection higher and higher,and accurate classification is a prerequisite for accurate detection of malware.The classification study of malicious code samples can help analysts understand the risks and threats related to malware.Respond to the intent and emerging trends of malware development to help improve detection accuracy or take precautionary measures against threats.Therefore,how to quickly and efficiently analyze malware and learn characteristics has become an important part of protecting network security.The current analysis of malware is mostly based on the feature extraction method.Through the feature extraction of malware strings,sink codes,opcodes,PE structures or captured dynamic invocation results,machine learning algorithms are used to complete the detection of maliciousness.However,with the development of malware technology,technologies such as obfuscation,packing,and anti-sandbox are emerging.To combat these complex malwares,effective feature extraction methods and efficient learning algorithms are needed to mine malicious behavior patterns.With the development of machine learning technology,it has become a trend to use advanced machine learning algorithms to study malware.In this paper,the limitations and shortcomings of the current malware feature extraction and classification algorithm models are studied as follows:(1)A hybrid feature extraction method combining static and dynamic features is proposed.It is difficult to obfuscate malicious sample obfuscation techniques for static analysis,and obfuscation techniques will obscure and hide malicious code,making it impossible to extract valid static features as expected when facing obfuscated code,and certain malicious behaviors of dynamic features that are difficult to be triggered and cannot be detected.The problem.This paper combines the static PSI and dynamic call sequence related theories,and proposes a hybrid malware analysis method.By combining static and dynamic features,the shortcomings of using a single method are overcome to a certain extent.(2)A hybrid signature malware classification method based on factor decomposition machine is proposed.In the past,although both static and dynamic analysis were used in the methods using mixed features,the weights of various features were essentially calculated independently without considering the connection and combination of features.This paper uses a factoring machine as a malware classifier to model the interaction between features to improve the accuracy of classification learning.At the same time,due to the superiority of the factor decomposition machine in the calculation of sparse features,the complexity of the sparse feature calculation of the binary coding of mixed features is reduced.(3)A method for malware dynamic feature classification based on Kronecker's convolutional improved Text CNN model is proposed.Due to the characteristics of malware behavior,that is,malicious operations usually manifest as short call sequences,and Text CNN is better at identifying short sequences,so this article uses the method of Text CNN to analyze the dynamic call sequences of malware.However,it is difficult to choose the size of the convolution to balance the speed and the capture of sequence features.In order to use a larger convolution without affecting performance,this article uses Kronecker convolution to improve Text CNN.Kronecker convolution replaces the large convolution weight matrix by the combination of multiple Kronecker products of small matrices,which can expand the perceptual domain without increasing convolution parameters,solving the problem of local information loss of hole convolution,and malicious behavior.The short sequence learning is also faster.
Keywords/Search Tags:Malware, Machine Learning, Classification, Dynamic Analysis, Static Analysis
PDF Full Text Request
Related items