Font Size: a A A

Malware Detection And Classification Based On Deep Learning

Posted on:2019-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:L YanFull Text:PDF
GTID:2428330572952124Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology,the dangers that malicious programs bring to people are also increasing.Due to the improvement of obfuscated coding technology,the number of malicious programs is increasing and it is difficult to identify.The traditional malicious program detection technology has been difficult to meet people's needs.This paper takes a deep learning approach to improve the model's ability to detect malicious programs.Deep learning is a new technique extended by machine learning.It is widely used in image processing,natural language processing,computer vision,and language recognition.Convolutional neural network(CNN)originates from the research of artificial neural networks and with excellent classification performance,it also has good prediction probability for unknown samples.In the traditional method,log files are extracted for analysis.However,this will lose the grammatical information of some dimensions in the word vector model,and it cannot show the nature of application behavior.The specific method adopted in this paper is to analyze the executable program through related tools,obtain the corpus of behavior information described in natural language,and train the word vector space according to the corpus,then use the word vector to express the extracted behavior information and obtain the behavior feature maps,at last train and test using convolutional neural network model.In order to improve the traditional detection methods and prove the performance of the experiment,two comparative experiments were done in this paper.The first experiment extracts the malware's API call sequence as text information,establishes a vector space model(VSM),represents the text and obtains the word vector feature map,then uses CNN method to complete the modeling.Through comparison,the information extracted in this paper can represent more nature of malware behaviors and preserve the grammatical information in the word vector.The second experiment mainly compares the modeling methods.The comparison experiment also uses the behavior information described in natural language,and uses TFIDF method to obtain the feature vector of each program,uses support vector machine(SVM)to establish high dimensional space and train detection model,then do the evaluation of the model we build,in order to illustrate the use of the feature extraction method used in this experiment and the choice of algorithm is moreappropriate.It can be seen from the results of the two comparison experiments that the correctness of the detection model established in this experiment is high and the false alarm rate is low,indicating that the behavioral information described by the natural language can better show the nature of program behavior without losing the grammatical information in some dimensions in word vector model.Then also shows that deep learning has broad application prospects in the analysis and judgment of program behavior.
Keywords/Search Tags:CNN, Deep Learning, API call sequence, VSM, Model Evaluation
PDF Full Text Request
Related items