Font Size: a A A

Research And Implementation Of A Deep Learning-Based JavaScript Malware Detection Technique

Posted on:2020-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y X YangFull Text:PDF
GTID:2428330575457134Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
JavaScript(JS)is a dominant programming language in web/mobile devel-opment,while it is also notoriously abused by attackers.Malicious JS detection has become a hot issue in security field.Recently,machine learning and deep learning techniques has made a lot of breakthroughs in various fields of artificial intelligence.To detect malicious JS instances,several machine learning-based methods have been developed by researchers.However,these methods takes JS as a natural language instead of a programming one,and models in natural language processing field are applied.Although there does exist similarities between a natural language and a programming one,these models and methods still ignore many characteristics which only exist in the programming language,and thus cannot capture its syntactic and semantic features.In this thesis,we present JSAC,a deep learning-based model to detect JS malware.JSAC combines machine learning method and program analysis tech-nique to capture the syntactic and semantic features of JS programs,and makes detections based on the features.Specifically,for a JS program,JSAC builds its syntax tree(AST)and control flow graph(CFG).The granularities of our work is node in AST and instruction in CFG.Next JSAC maps the nodes and instructions to real-valued vectors.Then the vectors of nodes and instructions are sent to tree-based convolutional neural netxwork(TBCNN)and graph-based convolutional neural network(GBCNN)separately.We employ the TBCNN to extract the program's syntactic features and employ the GBCNN to extract its semantic features.The vector of syntactic features and vector of semantic features are further fused in a fusion layer.The final classification is performed in the output layer,which applies softmax function.Evaluation on a corpus of 69,523 valid and unique JS files indicates that JSAC outperforms 4 other ma-chine learning models with 98.71%accuracy,98.83%precision,98.64%recall and 98.73%F1-score.The results of experiments also prove the contribution of syntactic features and semantic features.
Keywords/Search Tags:deep learning, program analysis, convolutional neural network, abstract syntax tree, control flow graph
PDF Full Text Request
Related items