Font Size: a A A

Research On Self-optimizing Real-time Detection Technology Of Unknown Malicious Code Based On Machine Learning

Posted on:2022-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:S X LiFull Text:PDF
GTID:1488306491975789Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
The invention and application of information technology has provided a powerful driving force for the economic and social development of all countries in the world,pushing the development speed of human society to an unprecedented new height.The vigorous development of Internet technology has further narrowed the distance between society and individuals,greatly promoted the communication and exchange of the human world,and the Internet has become an indispensable part of human life.However,with the continuous development of Internet technology and the vigorous development of mobile payment and other network technologies,the malice,destructiveness and robustness of network attacks are significantly enhanced.Malicious code has become a common threat in network security,which poses a great threat to people's property security.Information security departments all over the world spend a lot of money to maintain the security of the network every year.Researchers have been trying to use more advanced technology to solve the problem of information security,and have achieved a lot of research results,which effectively curb the harm caused by malicious network attacks.However,due to the particularity of information technology,the development of malicious code anti detection technology is very fast.On the one hand,it develops the carrier from a single executable file to a variety of file types(such as PDF file,doc file,multimedia file,etc.),on the other hand,it uses confusion,injection,camouflage,encryption and other ways to avoid detection.The update speed is very fast,especially a large number of malicious file variants With the emergence of new types of malicious files,the traditional protection scheme based on artificial feature extraction and matching for malicious code detection is facing a huge challenge.How to predict the characteristics of unknown malware,detect malware in advance during the execution of malware,and stifle the spread of malware in the bud has become a hot topic in information security research.In order to solve this problem,the dissertation studies the detection of malware based on deep learning,which includes MS-DOC file detection technology,weighted Directed Cyclic Graph based malicious code detection technology,API based self optimization real-time detection technology of unknown malicious code.The main research results are as follows1.At present,the research object of malicious code detection is mainly executable files,but there is little research on malicious code based on other file types.In view of the wide application of DOC malicious files in APT attacks,the dissertation proposes a malicious file detection method based on doc file structure and deep learning.Firstly,the doc file structure is analyzed to determine the type of malicious code and the main file location.Then,the malicious code file segment is transformed into 8-bit gray image by data visualization method.Finally,the gray image is identified by convolution neural network CNN.In this dissertation,three convolution networks,Le Net5,Alex Net and VGGNet,are used to test.Finally,the experimental results show that the detection method is feasible.In addition,the dissertation also collected some unknown malicious samples for detection,the detection results show that the method also has a certain ability to detect unknown malicious files.In order to further improve the detection accuracy,this dissertation proposes a detection strategy based on J-CNN,which combines several homogeneous or heterogeneous CNN models to determine whether the final familiarity of the samples to be detected is malicious or benign.The experimental results show that this method significantly improves the detection accuracy.2.In the aspect of malicious file detection based on executable file,the traditional detection method based on static analysis and dynamic analysis is to extract features from existing samples,and then match with malicious sample feature library to detect malicious code,which is facing a huge challenge in the case of iterative development and constant update of malicious files War.To solve this problem,this dissertation proposes a feature extraction method based on Markov and a malicious code detection method based on machine learning.Firstly,the API call sequences of a large number of malicious code samples are analyzed.Based on Markov chain,the universal weights of malicious code API call are extracted,and the directed loop graph of API call is constructed for a single sample to be detected.Through the mapping operation of the two data structures,the feature map of the final sample to be detected is obtained.After the main features are obtained by principal component analysis,the convolution neural network,support vector machine,decision tree,random forest,naive Bayesian network and other machine learning models are used for recognition and classification.In view of the fact that in the real scene,the structure of the characteristic graph of malicious samples is different,the size is different,and the dimension of the adjacency matrix is not consistent,which will bring a lot of inconvenience to the detection system.Therefore,the graph convolution neural network with graph as input has the flexibility of input.A malicious code classifier based on graph convolution neural network is proposed to recognize and classify the malicious code.This method does not need to use the principal component analysis method,but directly input the characteristic graph of the sample to be detected to classify.In order to prove the effectiveness of this method,experiments are carried out based on data sets of different years.The experimental results show that the proposed feature extraction method and the detection method based on graph convolution neural network can maintain good detection effect in the face of different years of data sets,which proves the effectiveness and universality of the method.3.At present,the feature database update based on malicious code detection still needs to manually extract features and input them into the database,so it will be severely tested in the face of a large number of new malicious code.In addition,the current real-time detection of malicious code is relatively insufficient.On this basis,this dissertation proposes a machine learning based malicious code self optimization real-time detection technology.Firstly,a new data structure,API pair graph,is proposed to meet the needs of real-time detection and feature extraction.In the stage of feature extraction and generation,the maximum entropy model is used to ensure the accuracy of known malicious code detection and maintain the uncertainty of unknown malicious code.Due to the training characteristics of the feature extraction model,the samples can be continuously input into the model for training,so as to meet the self updating of malicious code features.In the detection phase,this dissertation first uses the sequence clustering algorithm to filter the generated sample features,so as to improve the detection time performance and resist the attack of anti learning.After that,the dissertation uses the long short-term memory network model to realize the real-time detection of samples.In order to realize the self updating of the model,the detected model will be put into the feature extraction model again for training,so as to realize the learning of samples without manual intervention.In this dissertation,the model is verified by a large number of experiments.The experimental results show that the model has a high accuracy rate for malicious code detection,and has a high detection ability for unknown malicious code based on time series.The real-time detection experiments show that the model can greatly shorten the detection time of malicious code,so as to reduce the harm of malicious code.The experimental results show that the model is robust to attack detection.
Keywords/Search Tags:Malware, API Call Sequence, Machine Learning, Self-optimizing, Real-time Detection
PDF Full Text Request
Related items