Font Size: a A A

Malicious Code Detection Technology Based On Deep Learning

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y X FuFull Text:PDF
GTID:2428330629450890Subject:Cyberspace security law enforcement technology
Abstract/Summary:PDF Full Text Request
While Internet technology is booming,malicious code attacks computers with software vulnerabilities,URL links,e-mails,and so on,causing huge losses to the majority of Windows system users.Therefore,research on malicious code detection technology for Windows systems is necessary.The traditional malicious code detection technology is prone to generate false positives and omissions,and cannot meet the current requirements.In addition,in order to avoid the killing of antivirus software,criminals use techniques such as packing and polymorphism to generate malicious code variants.In order to effectively solve the above problems,this paper uses the convolutional neural network model for malicious code detection;at the same time,the One-Hot and Word2 vec are combined to optimize the feature vectorization work,thereby improving the detection of malicious code in the model detection stage effect.The main research contents of this article are as follows:1.Malicious code feature pre-processing: At present,it is mainly to extract the byte code,PE structure and assembly code of the sample program to conduct subsequent malicious code detection.This article mainly extracts dynamic behaviors and uses the Cuckoo sandbox to simulate the operating environment to obtain the analysis log file of malicious code.Feature preprocessing is to extract feature information that can show the dynamic behavior of malicious code from the redundant log files.The approach in this article is to write a Python script to extract the API function information in the log file,including the function type and function name,and then convert the extracted information into an API call sequence.2.Feature vectorization model selection: each API function in the text information is uniquely numbered,so that each malicious code API call sequence is converted into a word number sequence,and finally the word number sequence is converted into a feature vector form.In this paper,the word vector model One-Hot and Word2 vec are used in combination to achieve a better feature vectorization effect.Finally,the experiment is compared with the common One-Hot model and Word2 vec model to detect the effect.3.Malicious code detection based on convolutional neural networks: Convolutional neural networks are widely used in image recognition,computer vision and other fields,and have also made breakthroughs in natural language processing in recent years.This paper draws on natural language processing methods,applies CNN with good classification performance in deep learning to the field of malicious code detection,and adjusts the parameters and optimizes the optimizer.This article uses the data set on Virus Share to design a total of four experiments.Experiments show that the accuracy of the convolutional neural network model after one-Hot and CBOW feature vectorization can be as high as 96%.In addition,the best optimizationalgorithm is selected,and its loss value is around 0.06,and the model convergence is high.The model is optimized.In summary,this paper improves the feature vectorization method,and also optimizes the model in optimization algorithm selection and other parameter adjustments.The proposed convolutional neural network CNN(One-Hot + CBOW)detection model has better Detection effect.
Keywords/Search Tags:Malicious code, Sandbox, Deep learning, Behavior characteristics, API call functions
PDF Full Text Request
Related items