Font Size: a A A

Research On Deep Learning Based Malware Feature Analysis And Detection Method

Posted on:2020-07-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:J X ZhangFull Text:PDF
GTID:1368330620454222Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Malware is one of the most serious threats today.One of the biggest challenges of malware detection is the effective and efficient detection of malware variants when detecting a large volume of and rapid growth of malware today.Traditional detection methods based on Hash,String,etc.cannot satisfy the demands of malware detection.Recent years,researches proposed a series of modern malware detection methods based on Artificial Intelligent technologies,such as machine learning,to effectively and efficiently detect the malware variants.However,these intelligent malware detection methods are also facing some challenging problems: low accuracy and low performance,low code coverage,low utility of detecting malwares from multi-platforms,low feature coverage of malware detection model,etc..To address these problems,this thesis proposes a series of intelligent malware detection methods which is mainly based on deep learning.Malware detection using deep learning also faces some challenges,such as very long sequential data representation,data sparsity,feature fusion,huge training tome cost,etc..Such challenges limit researches and applications of deep learning based malware detection techniques.Due to the low precision and low recall of traditional intelligent malware detection techniques,the thesis proposes an operation code(opcode)semantic representation based malware detection method to improve the accuracy of the lurked malware variants.This method represent malware by using opcode bi-gram semantic representation.The operation bi-gram can finely-gained represent the local semantic of very long operation code sequences.The thesis also proposes a principal component initialized convolution neural networks to effectively and efficiently classify malicious opcodes and legitimate ones.The technique solves the problem of very long sequence semantic representation and classification.Due to the low utility of detecting malwares from multi-platforms,the thesis proposes an byte code semantic representation based malware detection method to detect the malware variants from multi-platforms.This method represents malware by using N-Length bytes co-occurrence semantic representation.The N-Length bytes co-occurrence matrix can finely-gained represent the local semantic of very long byte code sequences.The method can effectively and efficiently classify malicious byte codes and legitimate ones from multiplatforms.This method solves the problem of multi-platforms malware detection.Since single type of features of malwares has to lose a part of information,this thesis proposes an opcode semantic feature based and API call statistical feature based malware variants detection method.The method proposes to integrate the features of opcode trained by convolutional neural networks and the features of API calls trained by back-propagation neural networks.This method solves the problem of feature fusion in deep learning.Since deep learning based malware detection methods take too much training time,this thesis proposes a global topology features based malware detection method to significantly reduce the training time cost.The method proposes to build a probabilistic graph of opcode dependencies and present several global topology features to quickly train and detect malware variants.Besides,the global topology features also formulate malware characteristics to make people easier understand the characteristics of malware.In addition,to adaptively detect malware in practice,the thesis designs an active learning based retraining mechanism for the neural networks which used in this thesis.
Keywords/Search Tags:Malware Variants, Intelligent Malware Detection, Deep Learning, Data Representation, Neural Networks
PDF Full Text Request
Related items