Font Size: a A A

Research On Malicious Code Detection Technology Based On Deep Learning

Posted on:2023-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:R W HuangFull Text:PDF
GTID:2558306905991099Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Today,with the advent of the significant data era,information leakage,malware and other network attacks are becoming more and more frequent,and the types and quantities of malicious codes are increasing daily.The increasingly severe security nature of the problem will cause severe economic losses to individuals or companies and even bring serious security problems to our country.Malicious code detection is a vital component of network security.It is used to identify and remove potentially harmful software from the Internet.The malicious code detection method based on machine learning method is becoming more prevalent in the security industry.However,their capabilities are still limited when it comes to extracting the features of malicious code,and some of them require manual extraction of features.This method is time-consuming and labor-intensive,and the extracted features do not deeply describe malicious code behavior,which may lead to low detection accuracy.This paper addresses the above problems by using deep learning techniques to study static and dynamic malicious codes.The main work of this paper is shown as follows.Firstly,static malicious codes are analyzed.Starting from texture features and operation features,in extracting texture features,this paper proposes a method to convert compiled files into grayscale images by Simhash processing.After generating grayscale images,global and local image texture features are extracted by the GIST algorithm and SIFT algorithm,and the global and local image features are fused.In the extracted opcode features,the malicious code is firstly decompiled using the sequence of control flowchart to get the opcode,and then the Ngram algorithm is used to obtain the opcode features.Since the grayscale image features and the opcode features of the malicious code reflect the differences between global and local malicious codes of the same type.Respectively,a feature fusion approach that can comprehensively examine the global and local characteristics is proposed.The method of fusing image features and opcode command features.The fused features are then used to train a long short-term memory,and the fused features are experimentally verified to be effective in detection.Secondly,the dynamic malicious code is analyzed.The API sequences are obtained by running the malicious code in a secure and controlled sandbox environment.The number of API sequences called is very large and repetitive,and too long sequences will also cause a burden on the model.In this paper,we use the idea of the longest common subsequence to process and de-duplicate API sequences.The traditional one-hot model represents text strings in a way that leads to vector dimensional disasters due to the large increase of words in the text,which in turn leads to a sharp decrease in model efficiency.So this paper uses the Word2 Vec model to convert API sequences into vector representations.The article fuses the advantages of the CNN model and Bi-LSTM model.It uses CNN model to enhance the understanding of API semantics to extract local features,and Bi-LSTM model to extract global features of contextual semantic information of API for detection and classification.The effectiveness of the model is verified through experiments and a comparison of each model.
Keywords/Search Tags:Malicious code, Deep learning, Feature fusion, API sequence
PDF Full Text Request
Related items