| The continuous development of information technology industry such as software makes the function and composition structure of software increasingly complex,and the complex software structure leads to frequent software security incidents,which seriously threatens industrial production and social security.Most of the software security problems come from the security risks of the software source code itself.Therefore,in recent years,the software source code security problems have been widely concerned by the community.In the face of the rapid growth of software vulnerability exploitation,how to quickly and effectively use the software source code for vulnerability detection is a hot research issue in the field of information security.The traditional static vulnerability detection methods mostly rely on the artificial customized vulnerability pattern for vulnerability detection,the detection accuracy is low and the efficiency is low.With the rapid development of deep learning and other technologies,a large number of static detection methods based on deep neural networks emerge.Most of the existing static vulnerability detection methods are characterized based on text or abstract syntax tree,without considering the structure information of source code and unable to represent the deep semantic information of source code,resulting in low accuracy and accuracy of vulnerability detection.In view of the above problems,the work of this thesis mainly focuses on the following three aspects:1)The efficiency of the existing vulnerability source code data collection methods is low,and the vulnerability description information is incomplete.This thesis collects the vulnerability source code data and marks the vulnerability information by collecting from the vulnerability disclosure website supplemented by manual methods.Firstly,the source code link of the vulnerability is obtained from the vulnerability public disclosure website and the vulnerability source code is collected.Meanwhile,the vulnerability description information is collected from the NVD and CVE websites and stored based on the vulnerability type and vulnerability ID.In order to eliminate the influence of irrelevant factors on the vulnerability detection experiment,the collected source code data were preprocessed,such as function granularity segmentation,file renaming,variable replacement,and deletion of irrelevant characters.This thesis mainly collects the program code written in C/C++ language,and finally constructs a vulnerability database containing 39160 code samples.2)In order to solve the problems existing in the representation methods of source code,such as incomplete representation information,failure to obtain the deep semantic information of source code and failure to capture the structure information of code,this thesis proposed the feature extraction method of source code CPGExtract to process the collected software source code data.Firstly,source code data is transformed into graph data structure by source code analysis tool,that is,the graph data structure composed of abstract syntax tree,control flow graph and program dependency graph.Then,the graph data structure composed of nodes and edges is transformed into feature vectors by using the graph embedding model struc2 vec,which is used as the input of subsequent neural network model training and detection.In this study,a comparison experiment was conducted with traditional data embedding methods Deep Walk,node2 vec and word2 vec.The experiment showed that the feature representation method based on code attribute graph used in this method could capture the structure information of source code and effectively and perfectly characterize the information of software source code.The graph embedding method based on struc2 vec achieves an average accuracy of more than 80%in node feature classification tasks,which is significantly improved compared with traditional data embedding methods.3)Aiming at the problems of poor efficiency and low accuracy of current source code vulnerability detection methods,this thesis proposes a vulnerability detection method based on graph attention network.Firstly,source code feature vector generated by source code feature extraction module is used as the input of vulnerability representation learning module,which consists of graph convolution-pooling layer,graph readout layer,full connection layer and Soft Max layer.In order to improve the learning effect of vulnerability representation,obtain the deep semantic information of vulnerability source code and improve the accuracy of vulnerability detection method,the graph attention network layer is selected as the network model of graph convolution layer in this thesis.By conducting a flawfinder comparison experiment with four traditional vulnerability detection models including cppcheck,deepbugs,Flaw Finder and vuldeepecker,this study proves that this method can effectively identify each type of vulnerabilities and flawfinder detection of software source code achieves an average accuracy of 87%.Compared with traditional vulnerability detection tools and methods,it has higher precision and accuracy. |