With the development of productivity levels,people’s demand for software functions continues to increase,resulting in complex software architecture,which virtually increases the probability of possible security flaws in software.In addition,code reuse also provides opportunities for the spread of vulnerabilities.Once these security vulnerabilities are exploited by criminals,the losses caused are immeasurable.Therefore,how to ensure software security has become an urgent problem for people to solve.With the great success of deep learning technology in the field of image and natural language processing,a large number of information security researchers have also begun to use deep learning methods for vulnerability detection.However,most of the current vulnerability detection methods based on deep learning use a single graph for source code characterization,and do not completely retain all the syntax,semantics,control flow and other information in the source code,resulting in the phenomenon of information loss in the code characterization process,and simply input the vulnerability features into the deep learning model during feature learning,and do not consider the features with low vulnerability correlation,which affects the performance of the model.In view of the above problems,this paper proposes an intelligent vulnerability detection method based on graph neural network,and the specific research content is as follows:(1)Due to the lack of large-scale and effective source code datasets in the field of information security,this study uses crawlers to collect data samples from NVD,Github and other websites.First,obtain the source code files containing the vulnerability from the vulnerability public disclosure website;Secondly,the sample data containing error information is modified manually,so as to obtain a vulnerability database containing 49736 samples.Finally,the abstract syntax tree and control flow graph corresponding to the source code are obtained through code slicing,abstract syntax tree extraction,control flow diagram extraction and other operations,and use them as sample data for subsequent model training and testing.(2)For the vulnerability detection model,the messaging network model based on the attention mechanism is used for vulnerability detection.First,the source code is treated as a text sequence,the code map features are constructed through the token-focused method,and the pre-trained module in the PL(Programming Language)model is used to initialize the node.In the process of feature learning,the channel spatial attention mechanism is used to average pool the input features and maximize pooling operations respectively,which gives higher weight to the vulnerability-related features and improves the sensitivity of the model to the vulnerability-related node information,so that the model performance can be effectively improved.Experimental results show that the vulnerability detection accuracy of the proposed model is better than that of other deep learning models based on natural language processing.(3)Aiming at the problem of lack of information and lack of information in the current code characterization process,this thesis proposes a source code representation method Multi-Graph Fusion,which uses abstract syntax trees and control flow diagrams for code characterization.And the MGFN(Multi-Graph Fusion Network)model was used to extract the two feature vectors separately.In terms of feature fusion,considering the different influences of abstract syntax trees and control flow graphs on vulnerability detection results,in order to better fuse the features of the two scales,this thesis uses(Attentional Feature Fusion,AFF)based on attention mechanism.This method inputs two vectors into the MS-CAM module respectively,combines local and global features through two channels,obtains a weight by using the sigmod activation function,and uses the weight to weighted average the two input vectors,so as to obtain the final fusion feature.Based on the above content,this thesis uses the proposed vulnerability detection model based on graph neural network to perform comparative experiments on the constructed dataset.Experimental results show that compared with the existing model,the accuracy of the model on this dataset is improved by 13%,11% improvement in recall,F1 scores improved by 10%. |