Font Size: a A A

Research On Key Technologies Of Malware Feature Extraction Based On System Call Analysis

Posted on:2021-04-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:D G DuFull Text:PDF
GTID:1368330605481205Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the progress and rapid development of modern information and Internet technology,the daily work,study,and daily life of people are increasingly inseparable from the Internet.Moreover,malware has become one of the major threats to Internet security,and even they can threaten national security.The popularity of obfuscation technologies used by malware authors results in that the number of the malware and their variants is exploding,however,the traditional signature-based malware detection is unable to deal with these complex situations.Therefore,security data analysts focus on dynamic behavior-based malware detection.Application programming Interface(API)is a calling interface between an application program and the operating system.Usually,the application performs the operations of the system resource by calling the API.So dynamic behavior based on API is an excellent feature.In the malware detection system,the feature extraction technology of malware becomes the key factor restricting the effect of malware detection.With the development of malware detection technology,malware analysts begin to focus on the dynamic behavior feature extraction of the malware.Therefore,dynamic behavior feature extraction based on API is a hot topic at present.This paper focuses on the key technology of extracting dynamic behavior features of the malware based on Windows API,and studies the application of machine learning algorithm,ensemble learning algorithm and deep neural network algorithm in the detection of malware to improve the effectiveness of detecting.The specific research content and main contributions are described as follows.(1)Research on the feature extraction technology of the variable length behavior based on API call order of malwareTo deal with the problem that the traditional signature-based malware detection technology fails to detect the malware employed code confusion technologies,one of the more popular methods of extracting behavior features is to extract n-gram from API call sequence to represent the behavior features of a program.However,it is easy to miss some good features and is vulnerable to the API insertion attacks.After studying and analyzing the sequence of API call of malware,a feature extraction method is proposed to extract API n-gram of different lengths from the API call sequence of malware by using variable length N-gram algorithm.And then,to reduce the dimension,the feature selection approach is used by information gain.Based on this,it is proposed that a dynamic behavior detection algorithm of the malware based on Naive Bayes.Experimental results show that the dynamic behavior detection algorithm based on the variable length API n-gram of malware can improve the performance of the system.(2)Research on the classified behavior feature extraction technology of malware based on API data dependence relationshipThe most popular behavior feature extraction method of the program is the method based on API data dependency graph.In view of the problems that the API behavior graphs are constructed difficultly and at the time of detection comparison graph matching algorithm has the higher time and space complexity,we analyze the malware API call and classify the API.Then,we proposed a novel feature extraction approach the classified behaviors graph(CBG)is constructed by the data dependence among the API call.In order to verify the effectiveness and extensibility of the CBG extracted for malware classification detection,the malware variant detection models based on traditional machine learning algorithm were designed.The experimental results show that Magpie,a malware behavior detection system based on API CBG and SVM,can detect malware variants.After extracting the dynamic classified behavior features of malware,the core of the research based on behavior detection becomes the design of malware detection and classification algorithm based on dynamic behavior features.For the detection rate of malware detection system,the ensemble learning algorithm is studied to improve the classification accuracy of malware variants.The experimental results show that the dynamic classified behavior detection system based on ensemble learning can reduce false positive rate and improve the accuracy of the classification detection of malware variants.(3)Research on the feature extraction technology of malware based on the correlation between the classified behaviorsBased on the study of the correlation between the classified behavior features extracted,this paper proposes a CBG n-gram based dynamic behavior feature extraction technology of malware family.Moreover,more common classified behavior features of malware family are found,which helps to improve the detection rate of malware and its variants.At the same time,the classification algorithm of deep neural network was studied,and a dynamic behavior detection algorithm of malware based on deep learning algorithm was proposed.Then,the detection system Magpie II was constructed.The experimental results show that the detection system of malware family based on deep neural network and API CBG n-gram can greatly improve the accuracy of classification detection.
Keywords/Search Tags:malware detection, API, feature extraction, n-gram, classified behavior
PDF Full Text Request
Related items