Font Size: a A A

Code Context Representation Learning For Vulnerability Detection

Posted on:2022-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:X C ZhangFull Text:PDF
GTID:2518306572960089Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the vigorous development of computer software,the number of software vulnerabilities is also increasing rapidly.Vulnerability repair is becoming more and more important.The traditional code review has higher requirements for the professional quality of software practitioners,and with the increase of software scale,only relying on code review can not meet the requirements of vulnerability inspection;The rule-based vulnerability automatic checking technology relies on the rules defined by experts to check the code;Traditional machine learning methods need to extract features manually to check vulnerabilities;In recent years,the development of deep learning provides a new research direction for vulnerability detection.However,there are some problems in the existing research,such as the incomplete use of code structure information,the extraction of code global information and the lack of focus on local information.To solve the above problems,this paper proposes a learning method of code context representation for vulnerability detection.The specific work is as follows:Firstly,we extract the abstract syntax tree of source code,slicing and cross function calling code,and extract the long path for the abstract syntax tree.We extract The control flow graph and program dependency graph of source code and cross function call code,and the node2 vec algorithm is used to generate representation vectors for the nodes of control flow graph and program dependency graph.Then,we propose a vulnerability code detection model based on context representation,which uses sequential neural network bilstm to learn the long path representation vector,which is called local context representation;Node2vec is used to learn the graph representation vector of CFG and PDG.According to the node order of the long path,the graph representation vector is embedded and fused with the local context representation vector to get the global context representation vector.The self attention mechanism is used to weight the global context representation to give higher weight to the vulnerability related vectors,Then the global context representation is input into the full connection layer for vulnerability detection.The experimental results show that the detection effect of this method is better than the existing vulnerability detection methods,and the F1 value of vulnerability detection on real data set FFmpeg is increased by 2.8%.
Keywords/Search Tags:Software vulnerability detection, Deep learning, Code context representation, Program slicing, Long path
PDF Full Text Request
Related items