Malware Detection And Classification Based On Deep Learning

Posted on:2019-12-31

Degree:Master

Type:Thesis

Country:China

Candidate:L Yan

Full Text:PDF

GTID:2428330572952124

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology,the dangers that malicious programs bring to people are also increasing.Due to the improvement of obfuscated coding technology,the number of malicious programs is increasing and it is difficult to identify.The traditional malicious program detection technology has been difficult to meet people's needs.This paper takes a deep learning approach to improve the model's ability to detect malicious programs.Deep learning is a new technique extended by machine learning.It is widely used in image processing,natural language processing,computer vision,and language recognition.Convolutional neural network(CNN)originates from the research of artificial neural networks and with excellent classification performance,it also has good prediction probability for unknown samples.In the traditional method,log files are extracted for analysis.However,this will lose the grammatical information of some dimensions in the word vector model,and it cannot show the nature of application behavior.The specific method adopted in this paper is to analyze the executable program through related tools,obtain the corpus of behavior information described in natural language,and train the word vector space according to the corpus,then use the word vector to express the extracted behavior information and obtain the behavior feature maps,at last train and test using convolutional neural network model.In order to improve the traditional detection methods and prove the performance of the experiment,two comparative experiments were done in this paper.The first experiment extracts the malware's API call sequence as text information,establishes a vector space model(VSM),represents the text and obtains the word vector feature map,then uses CNN method to complete the modeling.Through comparison,the information extracted in this paper can represent more nature of malware behaviors and preserve the grammatical information in the word vector.The second experiment mainly compares the modeling methods.The comparison experiment also uses the behavior information described in natural language,and uses TFIDF method to obtain the feature vector of each program,uses support vector machine(SVM)to establish high dimensional space and train detection model,then do the evaluation of the model we build,in order to illustrate the use of the feature extraction method used in this experiment and the choice of algorithm is moreappropriate.It can be seen from the results of the two comparison experiments that the correctness of the detection model established in this experiment is high and the false alarm rate is low,indicating that the behavioral information described by the natural language can better show the nature of program behavior without losing the grammatical information in some dimensions in word vector model.Then also shows that deep learning has broad application prospects in the analysis and judgment of program behavior.

Keywords/Search Tags:

CNN, Deep Learning, API call sequence, VSM, Model Evaluation

PDF Full Text Request

Related items

1	Research And Implementation Of Android Malicious Application Detection Based On Deep Learning
2	Research And Implementation Of Malware Classification Based On Deep Learning
3	Research On Emotional Dialogue Generation Model Based On Deep Learning
4	Research On Chinese Lip Reading Recognition Based On Deep Learning
5	A Study Of Model Compression Approaches To Deep Learning-based Sequence Models
6	Research Of Model For Abstractive Summarization Based On Deep Learning
7	Research And Application Of Related Techniques For Text Summarization Based On Deep Learning
8	Research On Text Summarization Technology Based On Deep Learning
9	Research On Sketch Segmentation Based On Deep Learning
10	Continuous Action Recognition Method Based On Deep Learning