| With the increasing size and complexity of software,the number of security risks in software has become more and more frequent,which makes software vulnerable to attacks by unscrupulous elements.In order to effectively improve the security of software,vulnerability detection technology has become the focus of research in the field of software security.Due to the strong characterization capability of graph models,research on vulnerability detection based on graph models has emerged.However,in the existing graph model in the process of source code characterization,there is a single graph model and a lack of source code characterization information,so the FPR of vulnerability detection increased.The same as,the complex structure of the source code leads to redundant information about the corresponding graph structure and a heavy computational burden.To solve the problems,the work in this paper is divided into the following areas,the details of which are as follows:(1)To address the current problem of a single type of graph model,which only represents part of the characteristics of the data source code,this thesis proposes a triple polar characteristic graph,which can show the forward characteristics,reverse characteristics,and basic characteristics of vulnerabilities through the fusion of vulnerability characteristic graph and patch characteristic graph.The graph parses the structural information of the source code with the help of an abstract syntax tree,generates entity structures,parses the statement information in the source code,shows the association between nodes through six kinds of relationships such as control dependency,data dependency and function call,and reflects the complex relationship between feature items.This provides a good basis for the research of vulnerability detection technology based on a graph model.(2)As TPCG contains rich content,complex structure,and large computational overhead,it is unsuitable for direct vulnerability detection,etc.This thesis proposes an Apriori-based Feature Subgraph Mining(AFSM)method based on the Apriori algorithm.Unlike traditional subgraph mining algorithms that only consider the set of frequent items,this algorithm incorporates the frequency and distance from the vulnerability outbreak node into the support function,which takes into account the factors affecting the confidence of the subgraph more comprehensively and ensures that the subgraph retains more complete information.Using BFS to reduce the number of iterations through graph correlations reduces the computational overhead and improves the efficiency of processing graph models.The subgraphs obtained through AFSM are stored in the Neo4 j graph database for subsequent vulnerability detection techniques based on graph models.(3)This thesis designs a Graph-based Pattern-match Vulnerability Detection(GPVD)method.The method first needs to establish a pattern library,generate TPCG based on the real vulnerability data collected from NVD and so on,apply the feature subgraph generated by the AFSM algorithm,extract the nodes and relationship information in the feature subgraph,find out the graph pattern that triggers the vulnerability from it according to the different triggering principles and triggering mechanisms of different types of vulnerabilities,and generate vulnerability query statements through Cypher syntax and store them in the pattern library.The code to be detected is generated into the corresponding code feature graph,and the patternmatching algorithm is used to calculate the pattern-matching result between the graph to be detected and the pattern library to achieve the effect of vulnerability detection according to the degree of matching.By using formal language,the execution process of a software system can be clearly described and demonstrated,and a software program model can be graphically created to better understand the relationship between modules,which helps to analyze and detect software systems more effectively.Synthesizing the above research,this thesis uses the proposed detection method to conduct experiments for eight different types of vulnerabilities,mines 3571 pattern statements in a graph database containing 19740 attributes and uses these statements to build a vulnerability pattern library.The experimental results show that GPVD achieves 90.37% precision and85.93% accuracy in facing overflow-type vulnerabilities in C/C++,and the FPR is reduced to less than 10%,with significant improvement in precision and false alarm rate. |