Font Size: a A A

API Call Sequence-based Malware Detection Method For Windows Platform

Posted on:2022-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:T G WangFull Text:PDF
GTID:2518306563464034Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Malware is currently one of the main threats to Internet security.Its explosive growth in number and the continuous improvement of self-protection technology seriously threaten people's economic interests.Therefore,how to efficiently detect malware is of great significance.Static analysis technology has high detection accuracy,but it is easy to be affected by packing and obfuscation technology.And dynamic analysis technology determines whether it is a malicious sample by analyzing the real behavior of code running,which avoids the obstacles of packing and obfuscation technology,but the accuracy rate needs to be improved.Based on the above issues,this thesis starts with dynamic analysis,takes dynamic API sequences as the research object,and discusses malware detection methods from two different perspectives.The main research work of this thesis is as follows:(1)A malware detection method based on dynamic API sequence and random forest is proposed.Aiming at the problem that the API sequence obtained through the dynamic method contains a large number of redundant APIs,this thesis proposes an API deduplication method,which can eliminate some redundant API fragments and reduce the length of the sequence without losing the API sequence information.Taking into account the sequence of API calls,this article uses the N-Gram algorithm to process the API sequence.For the problem that the feature extraction using this algorithm will cause the feature dimension to be too large,the information gain algorithm is used for feature selection.Finally,the random forest algorithm is used to train the generated feature vector to realize the detection of malware.This thesis uses this method on the dataset of the Aliyun Malicious Program Detection Challenge,and obtains a high detection accuracy.The effectiveness of the method is proved by comparing it with other machine learning algorithms.(2)A malware detection method based on Word2 vec word vector representation method and convolutional neural network fusion of attention layer is proposed.Taking into account the relationship between API contexts,this thesis uses the Word2 vec model to vectorize the API,and expresses the similarity and correlation between the APIs through the distance of the space,instead of the word embedding layer of the convolutional neural network.Aiming at the single structure of the traditional convolutional neural network and the low accuracy of malware detection,this article optimizes the network structure,using different sizes of convolution kernels in the convolutional layer to fully extract the local features of the API sequence.An attention mechanism is added after the pooling layer to assign different weights to different features,effectively extracting more critical local API information.By comparing with the basic convolutional neural network,it is proved that the model proposed in this thesis is more effective.This thesis designs experiments for the above two methods,and improves the accuracy of malware detection by continuously adjusting the parameter optimization model.The experimental results show that the two methods proposed in this thesis can effectively detect malicious samples.Compared with the traditional similar methods,the detection accuracy has been significantly improved.
Keywords/Search Tags:malware detection, API call sequence, random forest, convolutional neural network
PDF Full Text Request
Related items